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Abstract 


An  integration  of  objects  and  databases  provides  a  framework  in  which  applications 
take  advantage  of  the  high  productivity  and  reusability  of  an  object-oriented  soft¬ 
ware,  and  at  the  same  time  the  sharability  and  maintainability  of  databases.  One  of 
the  approaches  for  achieving  this  integration  is  to  instantiate  objects  from  relational 
databases  through  views.  In  this  approach,  a  view  is  defined  by  a  relational  query 
and  a  function  for  mapping  between  object  attributes  and  relation  attributes.  The 
query  is  used  to  materialize  the  necessary  data  into  a  relation  from  database,  and  the 
function  is  used  to  restructure  the  materialized  relation  into  objects. 

The  approach  of  instantiating  objects  fiom  relational  databases  through  views 
provides  an  effective  mechanism  for  building  object-oriented  applications  on  top  of 
relationeJ  databases.  However,  a  system  built  in  such  a  framework  has  the  overhead  of 
interfacing  between  two  different  models  -  ^  object-oriented  model  and  the  relational 
model  -  in  terms  of  both  functionality  and  performance.  In  this  thesis,  we  address  two 
important  problems:  the  outer  join  problem  and  the  instantiation  efficiency  problem. 

Outer  join  problem:  In  instantiating  objects,  tuples  that  should  be  retrieved 
from  databases  may  be  lost  if  we  allow  only  inner  joins.  Hence  it  becomes  necessary 
to  evaluate  certain  join  operations  of  the  query  by  outer  joins,  left  outer  joins  in 
particular.  On  the  other  hand,  we  sometimes  retrieve  unwanted  nulls  from  nulls 
stored  in  databases,  even  if  there  is  no  null  inserted  during  query  processing.  In  this 
case,  it  is  necessary  to  filter  some  relations  with  selection  conditions  which  eliminate 
the  tuples  containing  null  attributes  in  order  to  prevent  the  retrieval  of  unwanted 
nulls.  We  develop  a  mechanism  for  making  the  system  generate  those  left  outer  joins 
and  filters  as  needed  rather  than  requiring  that  a  programmer  specifies  it  manually 


IV 


as  part  of  the  query  for  every  view  definition.  We  also  address  how  to  reduce  the 
number  of  left  outer  joins  and  filters  for  reducing  the  query  processing  time. 

Instantiation  efficiency  problem:  Since  the  advent  of  the  relational  databases, 
it  has  been  universally  accepted  that  a  query  result  is  retrieved  as  a  single  flat  rela¬ 
tion  (a  table).  Such  a  relation  is  neither  normalized  nor  nested  if  the  query  includes 
joins  and  has  redundancies.  This  single  table  concept  is  not  useful  in  our  framework 
because  a  client  wants  to  retrieve  object  instances.  Rather,  a  single  flat  relation  con¬ 
tains  data  redundantly  inserted  just  to  make  the  quer}'  result  ‘flat’.  These  redundant 
data  convey  no  extra  information  but  only  degrade  the  performance  of  the  sj^stem. 
This  fact  motivated  us  to  look  into  different  methods  which  reduce  the  amount  of 
data  that  the  system  must  handle  to  instantiate  objects,  without  diminishing  the 
amount  of  information  to  be  retrieved.  In  this  thesis,  we  present  two  alternative 
methods  which  retrieve  a  query  result  in  less  redundant  structures  than  a  single  flat 
relation.  Our  result  demonstrates  that  these  two  methods  incur  far  less  cost  than 
the  method  of  retrieving  a  single  flat  relation.  We  assume  a  computing  environ¬ 
ment  that  is  a  client-server  architecture,  where  relational  databases  reside  on  servers 
and  applications  reside  on  connected  workstations.  Main  memory  database  systems 
will  benefit  most  from  our  work,  although  our  work  is  useful  for  secondary  storage 
database  systems  as  well. 
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Chapter  1 
Introduction 


We  have  seen  increasing  effort  for  supporting  object-oriented  applications  with  databases. 
One  of  the  approaches  for  this  effort  is  to  instantiate  objects  from  relational  databases 
through  views  [14,  16,  17,  19,  8,  10,  12].  A  view  is  defined  by  a  relational  query  and  a 
function  for  mapping  between  object  attributes  and  relation  attributes.  The  query  is 
used  to  materialize  the  necessary  data  into  a  relation  from  databases,  and  the  function 
is  used  to  restructure  the  materialized  relation  into  objects. 

The  approach  of  instantiating  objects  from  relational  databases  through  views 
provides,  an  effective  mechanism  for  building  object-oriented  appHcations  on  top  of 
relational  databases.  Example  applications  are  engineering  design  software  such  as 
computer-aided  design  (CAD)  or  computer-aided  software  engineering  (CASE).  These 
appHcations  become  more  effective  by  utiHzing  the  locality  and  information  encapsu¬ 
lation  available  from  an  object-oriented  approach.  Complex  objects  [29,  30,  31,  44, 
45,  46,  24]  are  typically  needed  in  these  appHcations.  Relational  databases  provide 
sharing  and  flexibiHty,  whose  benefit  becomes  magnificent  as  the  size  of  databases 
become  larger.  A  system  built  in  such  a  framework  has  the  overhead  of  interfacing 
between  two  different  models  -  an  object-oriented  model  and  the  relational  model  - 
in  terms  of  both  functionaHty  and  performance.  In  this  thesis,  we  address  two  im¬ 
portant  problems:  outer  join  [37]  problem  and  instantiation  efficiency  problem.  The 
outer  join  problem  is  a  functionaHty  problem  as  well  a  performance  problem,  while 
the  instantiation  efficiency  problem  is  entirely  a  performance  problem. 
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CHAPTER  1.  INTRODUCTION 


1.1  Outer  Join  Problem 

In  instantiating  objects,  some  particiilar  conditions  arise  that  are  not  so  common 
in  traditional  relational  database  operations.  First  of  all,  as  will  be  shown  in  Sec¬ 
tion  3.2.2.1,  it  often  happens  that  we  lose  tuples  that  should  be  retrieved  from 
databases,  if  we  allow  only  inner  joins.  Hence,  it  becomes  necessary  to  evaluate  some 
joins  of  the  query  by  outer  joins.  In  particular  we  need  unidirectional  outer  joins  such 
as  left  outer  joins  [37].  On  the  other  hand,  we  sometimes  retrieve  unwanted  nulls  from 
nulls  stored  in  databases,  even  if  there  is  no  null  inserted  during  query  processing. 
In  this  case,  it  is  necessary  to  filter  some  relations  with  selection  conditions  which 
eliminate  the  tuples  containing  null  attributes  to  prevent  the  retrieval  of  unwanted 
nulls . 

It  is  desirable  to  make  the  system  generate  those  left  outer  joins  and  filters  as 
needed  rather  than  requiring  that  a  programmer  specifies  them  manually  as  part  of 
the  query  for  every  view  definition.  We  develop  such  a  mechanism  in  the  first  part  of 
this  thesis. 

Without  optimization,  declarative  approaches  such  as  SQL  queries  and  views  are 
not  practical.  However,  optimization  of  queries  with  outer  joins  has  rarely  been 
treated.  Since  left  outer  joins  are  not  symmetric,  they  inhibit  a  query  optimizer  from 
attempting  to  reorder  joins  for  more  efficient  query  processing.  Furthermore,  appli¬ 
cation  of  non-null  filters  is  not  free.  It  incurs  the  cost  of  evaluating  the  corresponding 
selection  predicates  on  a  base  relation.  We  show  that,  for  certain  cases  that  occur 
frequently,  these  two  operators  can  be  avoided  without  affecting  the  query  result. 


1.2  Instantiation  Efficiency  Problem 

The  chent-server  architecture  is  becoming  a  standard  architecture  in  modern  comput¬ 
ing  environment.  In  the  client-server  architecture,  object-oriented  applications  run 
on  client  workstations  and  access  data  stored  in  remote  database  servers.  A  view 
pertinent  to  an  object  type  contains  a  relational  query,  which  is  delivered  to  a  remote 
database  server;  The  query  result  is  retrieved  from  a  server  and  is  restructured  into 


1.3.  ORGANIZATION  OF  THE  THESIS 
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nested  relations  [70,  71,  72]  by  a  client. 

Since  the  advent  of  the  relational  databases  [26],  it  has  been  universally  accepted 
to  retrieve  a  query  result  as  a  single  flat  relation  or  a  table.  In  fact,  one  of  the 
advantages  of  the  relational  model  is  that  it  enables  us  to  apply  the  same  language  (a 
relational  query)  uniformly  on  both  base  relations  and  query  results.  However,  this 
concept  is  not  useful  in  our  work  because  what  a  client  wants  to  retrieve  is  a  nested 
relation,  not  a  flat  relation.  Rather,  a  single  flat  relation  contains  data  redundantly 
inserted  just  to  make  the  query  result  ‘flat’.  These  redundant  data  convey  no  extra 
information  but  only  degrade  the  performance  of  the  system.  Certainly  it  will  be 
more  efficient  to  manipulate  less  data  as  long  as  we  retrieve  the  same  information. 

In  the  second  part  of  this  thesis,  we  present  two  alternative  methods  of  instan¬ 
tiating  objects  from  remote  relational  databases  through  views.  The  two  methods 
retrieve  a  query  result  in  other  structures  than  a  single  fiat  relation.  One  method 
retrieves  a  set  of  relation  fragments  and  the  other  method  retrieves  a  single  nested 
relation.  We  will  demonstrate  that  these  two  methods  incur  far  less  cost  than  the 
method  of  retrieving  a  single  flat  relation. 


1.3  Organization  of  the  Thesis 

Following  this  introduction,  we  describe  the  background  framework  of  our  work  in 
Chapter  2.  Then,  the  outer  join  problem  and  the  instantiation  efficiency  problem  are 
addressed  respectively  in  Chapter  3  and  Chapter  4.  We  develop  a  rigorous  system 
model  within  Chapter  3.  The  system  model  is  developed  basically  for  providing  a 
basis  for  solving  the  outer  join  problem  but  is  also  used  for  the  instantiation  efficiency 
problem.  Finally,  conclusion  follows  in  Chapter  5. 


Chapter  2 

Background  Framework 


2.1  Introduction 

In  this  chapter,  we  provide  the  framework  upon  which  this  thesis  stands.  We  start 
from  a  general  framework  for  integrating  objects  and  databases  and  categorize  the 
general  framework  in  Section  2.2  through  Section  2.5.  Two  different  dimensions  are 
used  to  categorize  the  general  framework:  integration  approach  and  binding  time. 
Meanwhile,  we  narrow  down  our  focus  to  the  view-object  framework,  which  is  de¬ 
scribed  in  Section  2.6.  The  view-object  framework  is  what  this  thesis  is  built  upon. 


2.2  Integration  of  Objects  and  Databases 

We  distinguish  two  alternative  approaches  to  the  integration  of  objects  and  databases: 
the  direct  object  storage  approach  and  the  indirect  base  relation  storage  approach.  In 
the  object  storage  approach,  an  object-oriented  model  is  used  uniformly  for  appli¬ 
cations  and  persistent  storage  [3,  1,  2,  5,  6,  89];  objects  are  retrieved  and  stored  as 
objects.  In  the  relation  storage  approach,  an  object-oriented  model  is  used  for  the 
applications  while  a  relational  storage  model  is  used  for  persistent  storage  [4,  8,  9, 
10,  11,  12,  19,  22],  and  objects  are  retrieved  by  evaluating  queries  to  databases^. 

^  There  are  some  systems  which  cannot  be  put  strictly  in  either  of  these  two  categories.  For 
Example,  PCLOS  [20]  allows  both  possibilities.  The  storage  can  be  relational,  object-oriented,  or 
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The  relation  storage  approach  incurs  the  overhead  of  mapping  between  different 
models  [14,  25],  but  is  useful  for  large  databases  since  the  relation  storage  approach 
supports  sharing  of  different  user  views  better  than  the  object  storage  approach. 
Direct  storage  of  objects  is  simple,  but  inhibits  sharability  |14].  For  example,  let 
us  assume  two  users  define  Employee  objects  differently  as  Eii^loyee(name,  salairy) 
and  En5)loyee(name,  depeurtment)  respectively.  In  the  object  storage  approach,  the 
two  Eii5>loyee  objects  are  stored  separately.  To  provide  sharing  requires  a  separate 
mechanism  for  identifying  the  owners.  In  the  relation  storage  approach  however,  this 
problem  does  not  occur  because  the  information  to  support  the  two  Enqjloyee  objects 
is  stored  in  a  single  relation  En53loyee(name,  saleary,  department),  and  their  owners 
are  distinguished  by  the  database  view  mechanism. 


2.3  Two  Perspectives  of  the  Relation  Storage  Ap¬ 
proach 

We  observed  two  different  perspectives  within  the  relation  storage  approach:  object- 
centered  [4,  9,  11,  12]  and  relation- centered  [19,  22].  In. object-centered  perspective, 
relation  schemas  are  generated  from  given  object  schemas,  i.e.,  types  and  their  hierar¬ 
chy.  Relations  are  the  destination  for  storing  objects,  and  objects  are  decomposed  into 
relations  using  the  concept  of  normalization.  On  the  other  hand,  in  relation- centered 
perspective,  object  schemas  are  defined  from  given  relation  schemas.  Relations  axe 
the  source  for  generating  objects,  and  objects  are  composed  from  relations.  The  com¬ 
position  of  objects  is  useful  for  building  object-oriented  apphcations  on  top  of  existing 
relational  databases^.  The  two  perspectives  may  look  hke  the  two  sides  of  the  same 
coin,  but  they  differ  in  terms  of  design  approach.  Figure  2.1  shows  the  two  perspec¬ 
tives.  In  Figure  2.1a,  the  Project-manager  type  is  mapped  to  the  Project -manager 
relation.  There  exists  a  separate  relation  for  each  corresponding  object  type.  In 
Figure  2.1b,  there  does  not  exist  a  separate  Project -manager  relation  in  the  given 

even  a  file  system  [21]. 

^We  cannot  throw  away  the  ielation2d  data  model  in  a  decade.  Remember  that  the  IMS  hierar¬ 
chical  data  model  implementation  is  still  prevalent  while  we  call  the  relational  model  ‘conventional’. 
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Type  Employee 
"fis-a 

Type  Project-manager 

_ -U-generates _ 

Relation  Employee(ssn,  . . .) 
Relation  Project-manager(ssn,  . . .) 


(a)  Object-centered  perspective 


(b)  Relation-centered  perspective 


Figure  2.1:  Two  perspectives  of  relation  storage  approach 

database.  Rather,  the  Project -manager  type  is  defined  as  an  abstraction  through 
views,  such  as  defining  a  join  between  the  En^loyee  relation  and  Project  relation 
along  the  manager-ssn  foreign  key.  The  join  retrieves  only  the  employees  that  are 
managing  one  or  more  projects.  Let  us  consider  the  Project -manager  as  a  derived 
relation  of  the  Employee  and  Project  relations.  Note  the  derived  relation  is  anzdo- 
gous  to  the  intensional  database  (IDB)  relation  [32,  34]  used  in  the  integration  of  the 
logic-based  model  and  relational  model  [34,  35,  36].  For  example,  the  IDB  relation  of 
the  Project -manager  is  written  as  follows  using  the  notion  of  Datalog  [32]. 

Project -manager(ssn,  •  •  •)  :  —  En5)loyee(ssn,  •  •  •)  &  Project(-  •  • , manager-ssn,  •  • 

ssn  =  manager-ssn. 

We  use  the  relation-centered  perspective  throughout  this  thesis  but  the  result  is  ap¬ 
plicable  to  the  object-centered  perspective  as  well,  particularly  during  execution  (op¬ 
erationally). 
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2.4  Instantiating  Objects  from  Relations  through 
Views 

Views  provide  a  user- defined  subset  of  a  large  database.  Thus,  as  mentioned  in  Sec¬ 
tion  2.3,  views  are  used  as  a  tool  for  providing  sharing  and  abstraction  in  interfacing 
between  an  object-oriented  model  and  the  relational  model.  We  also  want  to  use 
the  views  for  instantiating  objects  from  relations.  To  achieve  this,  views  should  pro¬ 
vide  mapping  between  heterogeneous  structures  of  the  two  models.  The  mapping  is 
done  by  linking  object  attributes  to  corresponding  relation  attributes.  Objects  have 
a  more  complex  structure  than  relations.  For  instance,  objects  support  aggregation 
hierarchies  [88,  72]  through  an  is-part-of  relationship^.  Hence  objects  have  a  nested 
structure,  which  is  different  from  nested  tuples  because  the  type  of  an  attribute  can 
be  a  reference  to  another  object.  Therefore,  given  relation  attributes,  it  is  difficult  to 
map  the  relation  attributes  to  object  attributes  without  explicitly  specified  mapping 
information.  We  thus  need  to  extend  the  views  by  adding  additional  component  for 
the  mapping,  that  is,  an  attribute  mapping  function. 

Figure  2.2  shows  an  example  of  instantiating  objects  through  such  an  extended 
view.  The  object  type  defines  the  structure  of  objects  to  be  retrieved  from  the 
database.  The  query  part  of  the  view,  what  we  call  a  view-query^  specifies  how  to  ma¬ 
terialize  the  objects  from  the  relational  database.  The  join  between  the  Employee  re¬ 
lation  and  the  Child  relation  has  the  semantics  of  nesting  such  as  “For  each  Employee 
tuple,  retrieve  the  matching  tuple  in  the  Child  relation.”  The  outer  relation  is  called 
a  source  relation  and  the  inner  relation  is  called  a  destination  relation  in  our  work. 
The  attribute  mapping  part  of  the  view  shows  the  aggregation  hierarchy  of  object 
attributes  and  their  mapping  to  relation  attributes.  The  mapping  is  one-to-one  as 
long  as  there  is  no  derived  attribute  among  the  object  attributes.  We  use  the  key  at¬ 
tribute  of  one  of  the  relations  as  the  source  of  the  object  identifier  (oid).  In  Figure  2.2, 
the  key  ssn  of  the  En5)loyee  relation  is  retrieved  to  become  the  oid  of  the  Employee 

^Objects  also  support  a  generalization  hierarchy  through  is-a  relationship,  inheriting  part  of  the 
attributes  from  parent  objects.  We  regarded  the  inherited  attributes  as  well  as  the  local  attributes 
uniformly  as  belonging  to  the  objects. 
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Database  schema:  /*  Underlined  attributes  are  keys.  * / 

Employeefssn.  e_name,  sex,  degree,  salary,  dept^^) 

Engineer(ssn,  specialty,  experience) 

Department(dept#,  d_name,  manager_ssn,  address) 

Child(ssn,  c-name.  sex,  birth-date) 

Object  Type  Employee  /*  [  ]  denotes  a  tuple.  */ 

[name:  string,  dept:  Department, 
children:  [name:  string,  birthDate:  string]] 

View: 

•  Query  expressed  in  relational  algebra: 

^{ssn,e_name,dept:j5^,c_name,birth_date}  Employee  Child 

•  Mapping  between  object  attributes  and  relation  attributes: 


ssn  e_name  dept#  <ssD,c_name>  c_name  birth_date 
Figure  2.2:  An  example  of  instantiating  an  object  type  through  views 
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object.  Object  id’s  are  not  explicitly  defined  in  the  type  definition  but  assumed  to 
exist  implicitly.  The  dept  attribute  of  an  Employee  object  has  type  Depeurtment .  We 
call  an  attribute  whose  type  is  another  object  type  a  reference  attribute.  In  object- 
oriented  paradigm,  a  reference  is  implemented  with  the  oid  of  the  referenced  object. 
In  our  framework,  the  value  of  a  reference  attribute  is  retrieved  from  the  key  of  a 
database  relation  which  is  mapped  to  the  oid  of  the  referenced  object.  Thus,  in  Fig¬ 
ure  2.2,  the  dept  attribute  of  an  En^jloyee  object  is  retrieved  from  the  dept#  of  the 
Depeirtment  relation,  if  we  assume  that  there  exists  a  type  Department  whose  object 
id  is  retrieved  from  the  dept#  of  the  Depanrtment  relation.  The  children  attribute 
defines  a  subobject  of  the  En5)loyee  object,  and  each  subobject  has  its  own  attributes 
-  name  and  birthDate.  Here  a  ‘subobject’  is  defined  as  an  object  which  does  not  have 
its  own  type  definition  but  has  its  structure  contained  in  another  object  which  again 
may  be  a  subobject  of  another  object.  Like  the  Employee  object,  a  children  subob¬ 
ject  is  assumed  to  have  its  object  id,  but  the  object  id  is  not  actually  retrieved  from 
a  database  relation.  The  id’s  of  the  children  subobjects  are  needed  for  a  different 
purpose,  which  will  be  discussed  in  Section  3.4.3. 


2.5  Object  Instantiation  Time 

The  integration  of  objects  and  databases  can  be  distinguished  according  to  another 
dimension  -  the  binding  time  [51,  52]  of  an  object  type.  Given  an  object  type,  we 
define  its  binding  time  as  the  time  when  its  instances  are  retrieved  from  databases 
into  an  application  space. 

A  binding  time  can  be  distinguished  into  early  binding  and  late  binding.  Early 
binding  is  a  compiled  approach.  That  is,  all  instances  of  an  object  type  are  retrieved 
all  at  once  prior  to  the  usage  by  an  application  program.  In  this  sense,  the  early 
binding  is  similar  to  caching  [59,  60]  or  prefetching  [61].  Once  all  instances  of  an 
object  type  are  retrieved,  an  apphcation  does  not  incur  the  cost  of  retrieving  the 
instances  of  the  same  object  type  unless  the  retrieved  instances  are  invalidated  by 
the  change  of  the  data  stored  in  databases.  Early  binding  becomes  a  feasible  idea  if 
an  application  works  in  a  canned  transaction  in  which  it  is  possible  to  preanalyze  the 
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Object  storage 

Base  relation  storage 

Object-centered 

Relation-  centered 

Early  binding 

View-objects 

Late  binding 

Table  2.1:  View-object  framework 


set  of  objects  that  will  be  used  by  an  application.  On  the  other  hand,  there  may  be 
a  situation  in  which  the  loading  time  for  instantiating  all  instances  of  an  object  type 
is  significant  but  this  loading  time  does  not  pay  off  because  the  application  does  not 
use  all  the  retrieved  instances.  In  case  only  a  small  subset  of  the  retrieved  objects  are 
used,  late  binding  is  more  appropriate.  Late  binding  is  an  interpreted  approach. 
That  is,  instances  of  an  object  type  are  retrieved  one  at  a  time  on  demand  during 
the  execution  of  the  application  program.  Late  binding  makes  it  possible  for  an 
application  to  retrieve  only  the  objects  that  are  actually  needed  during  execution  and 
hence  takes  less  main  memory  space  than  early  binding.  However,  if  all  the  instances 
turn  out  to  be  used  during  the  execution  of  an  apphcation,  late  binding  strategy 
becomes  worse  than  early  binding  by  incurring  as  many  object  requests  to  databases 
as  the  number  of  used  objects.  Note  that  the  early  binding  incurs  the  object  request 
only  once  for  a  given  object  type  as  long  as  the  retrieved  instances  remain  valid. 

From  a  system  design  point  of  view,  we  can  think  of  a  range  of  choice  between 
the  early  binding  and  the  late  binding,  i.e.,  between  the  compiled  approach  and 
the  interpreted  approach.  This  is  analogous  to  the  interpreted-compiled  range  (I- 
C  range)  in  interfacing  the  Prolog  with  relational  databases  [53].  The  criteria  of 
choosing  between  the  I-C  range  are  the  execution  time  and  memory  space.  That  is, 
ideally  we  want  to  retrieve  the  minimum  number  of  objects  that  are  needed  by  an 
apphcation  at  the  minimum  number  of  object  requests. 


2.6  View-object  Framework 
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2-6.1  View-objects 

In  [14],  Wiederhold  proposed  database  views  as  a  tool  for  “connecting  between  object 
concepts  in  programming  languages  and  view  concepts  in  database  systems”.  A  view 
is  defined  by  an  external  schema  at  the  external  level  of  the  ANSI/ SPARC  architecture 
[27,  28].  Different  groups  of  users  can  have  different  views  on  the  same  database. 
A  view  has  been  used  as  a  mechanism  for  mapping  between  the-  different  external 
schemas  of  different  user  views  and  the  conceptual  schema  of  the  entire  database 
in  two  ways.  The  goal  of  the  view  mechanism  is  twofold:  windowing  and  security. 
Users  access  the  same  database  through  different  ^windows’  defined  by  different  views. 
Query  formulation  is  simplified  by  enabling  a  user  to  write  a  query  as  if  a  view  were 
just  another  base  relation.  At  the  same  time,  users  are  restricted  to  access  only  a 
subset  of  a  database,  defined  by  a  view^.  The  goal  of  windowing  emphasizes  using 
views  as  a  tool  for  materializing  a  subset  of  data  from  relations,  while  the  goal  of 
security  puts  more  emphasis  on  using  views  as  a  tool  for  managing  a  database  system. 

Wiederhold’s  proposal  of  view-objects  put  more  emphasis  on  the  goal  of  window¬ 
ing,  that  is,  using  views  as  a  tool  for  materializing  view-objects  from  relations.  A 
principal  way  of  storing  relations  is  to  normalize  them  into  nonredundant,  unambigu¬ 
ously  updatable  form  -  Boyce- Codd-normal  Form,  for  example.  A  materialized  view 
is  only  in  the  first  normal  form  and  is  closer  to  an  ^object’  in  the  sense  that  related 
attributes  are  brought  together.  For  example,  the  view  of  the  Enqsloyee  object  type 
in  Figure  2.2  brings  together,  when  materialized,  the  information  about  an  employee 
and  the  information  about  the  employee’s  children.  Note  that  the  attributes  of  an 
entity  denoting  a  real  world  object  are  decomposed  into  the  attributes  of  normalized 
relations  in  a  database  design  process.  We  can  say  that  a  view  is  used  to  Sreassemble’ 
the  decomposed  attributes  into  the  attributes  of  the  entity. 

Objects  that  we  are  dealing  with  in  this  thesis  are  view-objects  because  the  ob¬ 
jects  are  insteuitiated  by  materializing  a  view.  In  our  work,  a  view-object  is  a  complex 
object  which  is  implemented  by  a  nested  relation  and  supports  references  among  ob¬ 
jects.  Table  2.1  illustrates  where  a  view-object  belongs  to  among  the  two-dimensional 

^It  is  typical  that  a  database  administrator  has  the  privilege  of  maintaining  the  security  of  a 
database  system  through  this  view  mechanism  by  assigning  views  to  each  group  of  users. 
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categories  of  the  framework  that  were  discussed  in  Section  2.2  through  Section  2.5. 
The  view-object  framework  belongs  to  the  relation- centered  perspective  of  the  rela¬ 
tion  storage  approach.  Early  binding  is  assumed,  that  is,  the  results  of  a  view-query 
are  retrieved  all  at  once  into  an  application  workspace  and  restructured  into  objects. 
The  client-server  architecture  is  appropriate  for  supporting  the  view-object  frame¬ 
work  [14].  In  this  architecture,  a  subset  of  the  database  content  residing  on  a  server 
is  retrieved  to  a  client  workstation  and  used  to  provide  objects  (after  necessary  re¬ 
structuring)  during  the  execution  of  an  application. 


2.6.2  Related  Work  on  View-objects 

In  [14],  a  view-object  generator  was  proposed  as  an  important  component  of  the 
system  implementing  the  view-object  concept.  Based  on  this  proposal,  Barsalou  et  al. 
[15,  16, 17]  implemented  a  view-object  generator  in  their  Penguin  project  [22,  23,  24]. 
Besides,  Cohen  [18]  implemented  a  different  kind  of  view-object  generator  in  his  OBI 
project. 

2.6. 2.1  Penguin 

Penguin  is  ^ln  expert  database  system  being  built  at  the  Stanford  University  for 
applications  in  the  areas  of  biomedical  engineering,  civil  engineering,  and  electrical 
engineering.  In  the  Penguin  project,  Barsalou  et  al.  implemented  a  view-object  gen¬ 
erator  using  a  structural  data  model  [13].  The  structural  data  model  is  essentially 
a  relational  data  model  and  is  augmented  with  connections.  The  connections  repre¬ 
sent  interrelational  constraints  such  as  referential  integrity  constraints  and  cardinality 
constraints.  Barsalou  et  al.  used  an  object  template  a  tool  for  formulating  a  view- 
query.  An  object  template  is  a  data  structure  with  different  attributes  (or  slots). 
Users  formulate  a  view  by  designating  a  pivot  relation  [16,  17]  and  selecting  connec¬ 
tions  to  follow  among  the  connections  to  neighboring  relations.  For  manipulating  the 
overlapping  views  of  multiple  objects,  the  object  templates  are  configured  into  a  hier¬ 
archy.  When  an  object  needs  to  be  instantiated,  users  select  the  corresponding  object 
template  and  specify  selection  conditions  on  a  set  of  relations  defined  in  the  object 
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template.  The  system  then  formulates  a  SQL  query  and  delivers  it  to  the  database. 
The  query  result  is  restructured  into  view-objects  using  a  NEST  [70]  procedure.  At 
the  time  of  this  writing,  a  second  prototyping  of  the  Penguin  project  is  still  ongoing 
work  at  the  Stanford  University. 

2.6.2.2  OBI 

OBI  is  a  Trolog-based  view-object-oriented  database’  designed  and  implemented  at 
the  David  Sarnoff  Research  Center®.  The  goal  of  the  OBI  project  was  to  design 
and  implement  a  Prolog-based  hybrid  system  of  relational  databases  and  object- 
oriented  databases.  In  OBI,  Cohen  designed  a  view-object  manager  and  a  direct 
object  manager  as  a  dual  system.  The  purpose  of  the  dual  approach  was  to  make 
it  possible  to  move  persistent  data  from  relation  storage  to  object  storage  back  and 
forth.  OBI  uses  its  own  data  definition  and  query  language  for  the  view-object 
manager.  The  query  language  is  similar  to  SQL  and  can  express  a  predicate  of  domain 
relational  calculus  within  a  query.  In  its  implementation  using  Prolog,  OBI  queries 
are  translated  into  a  Prolog  goal  and  is  executed  by  a  standard  Prolog  execution 
mechanism.  Unlike  the  Penguin  view-object  generator,  no  separate  NEST  procedure 
is  necessary.  The  view-object  manager  materializes  a  nested  relation  directly  out  of 
relational  databases. 
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Chapter  3 

Outer  Joins  and  Filters  in  a 
View-Query 


3.1  Introduction 

In  this  chapter,  we  develop  a  mechanism  for  deciding  on  inner  joins  or  outer  joins, 
and  prescribing  non-null  filters  for  a  view-query.  We  first  formulate  our  problem  in 
a  concrete  manner  in  Section  3.2.  Then,  we  develop  a  rigorous  system  model  to 
facilitate  the  mapping  between  objects  and  relations  in  Section  3.3.  The  mechanism 
is  developed  in  Section  3.4.  A  summary  of  this  chapter  follows  in  Section  3.5. 


3.2  Problem  Formulation 

In  this  section  we  first  introduce  two  operators:  left  outer  join  and  non-null  filters. 
Then,  we  formulate  a  problem  by  exaplaining  the  motivation,  objective,  and  our 
approach  to  the  problem. 

3.2.1  The  Two  Operators 

In  Chapter  1,  we  mentioned  the  need  for  two  operators  for  instantiating  objects  from 
relational  databases  through  views:  a  left  outer  join  and  a  non-null  filter.  A  left  outer 


14 


3.2.  PROBLEM  FORMULATION 


15 


join  is  different  from  an  inner  join  in  that  it  retrieves  null  tuples  when  there  is  no 
matching  tuple  in  the  destination  relation  for  a  given  source  relation.  A  non-null  filter 
is  a  selection  condition  for  eliminating  any  nulls  of  an  attribute  from  a  base  relation^ . 
Formal  definitions  of  the  left  outer  join  and  the  non-null  filter  are  as  follows. 

Definition  3.2.1  (Left  Outer  Join)  Given  two  relations  Ri  and  R2,  a  left  outer 
join  from  Ri  to  R2,  denoted  by  Ri  [X  R2,  is  defined  as  follows. 

RilX  R2  =  {Rit><R2)0({Ri  -J1r,{RiXR2))  X  A)  (3.1) 

where  X  denotes  an  inner  join,  7rj{,(iJi  Xi22)  denotes  the  projection  of  Ri  Xi?2  on 

the  attributes  of  J?i,  and  A  denotes  a  null  tuple  consisting  of  nulls  for  all  attributes 

of  R2.  In  other  words,  Ri  tX  R2  produces  the  following  set  of  tuples. 

ABB 

{<  ti,t2  >  |ti  €  Ri  A  t2  £  R2  A  ti.A6t2.B}  U 
■{<1  ti>  A  >  |ti  G  RiA  ,9t2(t2  £  R2  A  ti.A0t2.B'jy  (^‘^) 

where  6  denotes  a  comparison  operator,  i.e.,  0  €  {<,  <,  >,  >,  =,  7^}. 

For  the  rest  of  this  chapter,  we  use  a  small  size  join  symbol  (IX)  to  denote  a  join  which 
can  be  (has  not  yet  been  determined  to  be)  either  an  inner  join  ( X )  or  a  left  outer 
join  ( [X  ). 

Definition  3.2.2  (Non-null  filter)  A  non-nuU  filter  is  a  conjunction  of  predicates 
applicable  to  a  base  relation  R,  defined  as  follows. 

R.Ai  ^  null  A  R.A2  ^  null  A  •••  A  R.Ai  ^  null  (3-3) 

where  Ai,  A2,  •  •  • ,  A,-  are  the  attributes  of  R  that  are  not  allowed  to  have  nulls. 

3.2.2  Motivation 

3.2. 2.1  Why  do  we  need  left  outer  joins  and  non-null  filters? 

Objects  are  identified  by  their  identifiers  (oid’s)  only.  In  other  words,  an  object  exists 

even  if  all  its  attributes  are  nulls  long  as  it  has  an  object  id.  Let  us  consider 

base  relation  is  the  relation  defined  by  the  relation  schema  of  a  database,  neither  a  view  nor 
an  intermediate  relation. 
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the  objects  of  typ  Employee  shown  in  Figure  2.2.  An  Employee  object  exists  only 
if  it  has  its  oid  retrieved  from  the  ssn  of  the  Employee  relation.  Assuming  that  the 
En^loyee  object  allows  null  for  its  children  attribute,  what  will  happen  if  the  join 
between  En5)loyee  relation  and  Child  relation  is  evaluated  by  an  inner  join?  Any 
employee  tuple  that  has  no  matching  tuple  in  the  Child  relation  will  be  discarded.  In 
other  words,  any  employee  without  children  will  not  be  retrieved.  Therefore,  we  must 
evaluate  the  join  by  an  outer  join  to  prevent  the  loss  of  employees  that  do  not  have 
children.  Furthermore,  what  we  need  is  not  a  bilateral  outer  join  but  a  unilateral 
outer  join,  because  we  are  not  interested  in  retrieving  a  Child  tuple  that  has  no 
matching  tuple  in  the  En5)loyee  relation,  that  is,  a  child  without  parent.  Therefore, 
a  left  outer  join  is  adequate  assuming  that  the  source,  here  the  Employee,  relation  is 
the  left  hand  side  operand  of  the  join.  We  assume  the  source  relation  is  always  on  the 
left  hand  side  of  a  join  and  thus  use  only  left  outer  joins  for  the  rest  of  this  chapter. 

Now  let  us  assume  the  Enqjloyee  objects  prohibit  nulls  for  the  dept  attribute  since 
a  department  affihation  is  required  of  every  employee.  As  mentioned  in  Section  2.4, 
the  dept  attribute  is  retrieved  from  the  dept#  of  the  En^jloyee  relation.  The  join 
between  the  En^loyee  relation  and  Child  relation  is  immaterial  to  the  retrieval  of 
dept#  attribute.  Rather,  nulls  of  the  dept#  attribute  stored  in  the  tuples  of  the 
relation  Enq>loyee  should  not  be  retrieved.  Therefore,  we  must  filter  the  Employee 
relation  with  a  selection  condition  ‘dept#  ^  null’.  We  call  this  selection  condition 
a  non-null  filter. 

We  see  from  the  above  examples  that  we  frequently  need  left  outer  joins  to  prevent 
the  loss  of  wanted  objects,  and  non-null  filters  to  prevent  the  retrieval  of  unwanted 
nulls . 


3. 2. 2.2  Why  do  we  want  the  system  to  do  it? 

Null- related  semantics  of  object  types  are  hard  to  understand  and  hence  Hkely  to 
induce  errors.  For  example,  the  Enqjloyee  type  definition  shown  in  Figure  2.2  does 
not  distinguish  between  the  semantics  of  ‘employees  and  their  zero  or  more  children’ 
and  the  semantics  of  ‘employees  with  at  least  one  child’.  A  left  outer  join  is  needed 
for  the  former  while  an  inner  join  is  needed  for  the  latter.  The  distinction  is  entirely 
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the  programmer’s  responsibility.  Even  if  the  semantics  is  clear,  it  is  an  effort  for 
the  programmer  to  determine  the  left  outer  joins  and  non-null  filters  given  an  object 
type  and  the  corresponding  view,  especially  if  the  view  defines  many  joins.  Therefore 
mechanization  of  the  process  is  useful. 

3.2.2.3  Why  do  we  want  to  reduce  the  number  of  left  outer  joins  and 
non-null  filters? 

The  view-query  is  processed  more  efficiently  if  we  can  elinunate  a  non-null  filter 
‘R.A  null’  without  affecting  the  query  result,  and  thus  avoid  evaluating  unnecessary 
selection  conditions.  Sometimes  it  is  known  at  the  semantic  level  that  the  column  A 
of  a  relation  R  contains  no  null.  An  example  is  when  A  is  the  key  of  R  and  the  entity 
integrity  [40]  is  preserved. 

The  query  also  becomes  more  efficient  if  we  reduce  the  number  of  left  outer  joins 
and  still  retrieve  the  same  result.  Sometimes  left  outer  joins  produce  the  same  tuples 
as  inner  joins.  For  example  in  Figure  2.2,  if  every  employee  has  one  or  more  children, 
then  the  same  tuples  are  produced  by  either  join  method.  We  know  this  fact  at 
the  semantic  level,  provided  that  the  system  enforces  the  referential  integrity  [40] 
from  Enqjloyee.ssnto  Child. ssn.  As  another  example,  let  us  consider  the  following 
directed  join  graph. 

Ri Ri  ^  Rz R4 

where  the  join  from  R2  to  Rz  is  a  left  outer  join  and  the  others  are  inner  joins.  If  it 
is  known  that  there  always  exists  a  matching  tuple  of  Rz  for  every  tuple  of  Rz^  then 
the  result  of  Ri  XRzlX  RzXRa  is  the  same  as  RiXRzXRzXR^.  Now,  if  we 
evaluate  the  join  as  an  inner  join,  then  the  optimizer  considers  the  three  joins  and 
will  choose  the  most  efficient  order  of  joins.  Let  us  assume  the  join  order  becomes 
^  ^  ^  the  optimal  plan.  On  the  other  hand,  if  we  evaluate  the 

join  as  a  left  outer  join,  the  query  optimizer  can  not  consider  reversing  the  order  of 
Rz  [X  Rz  and  thus  can  not  obtain  the  same  optimal  plan.  In  general,  converting 
a  left  outer  join  to  an  inner  join  allows  the  query  optimizer  to  deal  with  a  larger 
number  of  joins.  This  increases  the  number  of  alternative  plans  but  will  certainly 
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never  generate  less  optimal  plan  than  when  left  outer  joins  are  evailuated  as  such  and, 
therefore,  cannot  be  reordered. 


3.2.3  Problem  Statements 

Our  objective  is  thus  to  develop  a  mechanism  for  the  system  to  decide  whether  the 
joins  of  a  query  should  be  evaluated  by  inner  joins  or  left  outer  joins  when  objects  are 
instantiated  from  relational  databases  through  views.  In  addition,  the  system  decides 
which  relations  should  be  filtered  through  non-null  filters.  For  efficiency  reason,  the 
number  of  left  outer  joins  and  non-null  filters  should  be  reduced  whenever  possible. 

3.2.4  Our  Approach 

The  heterogeneity  of  the  object-oriented  model  and  the  relational  model  causes  several 
difficulties  in  mapping  between  the  two  models  [41] .  Hence  we  cannot  expect  a  simple 
solution  to  our  problems  without  a  well-defined  system  model.  The  system  model 
shoidd  satisfy  the  following  criteria. 

•  It  provides  the  context  in  which  we  can  develop  a  simple  solution  to  the  problem. 

•  It  is  based  on  a  standard  model  and  can  be  easily  implemented  in  many  existing 
systems. 

Given  the  system  model,  we  develop  a  mechanism  for  solving  the  problem.  We  use 
only  one  parameter  that  users  should  provide  to  the  system.  It  is  a  non-null  option 
on  the  object  attribute  as  will  be  explained  in  Section  3.3.1.  Users  do  not  even  have 
to  know  what  a  left  outer  joins  is.  To  prevent  losing  nonmatching  tuples  when  nulls 
are  allowed  (by  default),  all  joins  of  a  query  eire  initialized  to  left  outer  joins.  The 
semantics  of  the  non-nuU  options  are  interpreted  as  non-null  constraint^  on  object 
attributes,  and  mapped  to  corresponding  non- null  constraints  on  the  query  result. 
Then  we  replace  some  left  outer  joins  by  inner  joins  and  add  non-null  filters  to  some 

^These  constraints  require  the  existence  of  an  object  attribute  given  the  oid  of  an  object.  We 
would  call  this  constraint  2is  an  existence  constraint  if  this  term  were  not  already  used  in  [32]  to 
mean  the  same  concept  as  the  referential  integrity. 


3.3.  SYSTEM  MODEL 


19 


relations  accordinglj'.  Finally,  the  number  of  left  outer  joins  and  non- null  filters  are 
reduced  using  the  integrity  constraints  of  the  data  model. 

The  non-null  options,  and  accordingly  the  non-null  constraints,  are  used  as  the 
correctness  criterion  of  the  mechanism.  Sometimes  there  appears  to  be  a  conflict  in 
determing  between  a  left  outer  join  and  an  inner  join.  For  example,  let  us  consider 
two  different  attributes  A  and  B  that  are  projected  from  the  same  relation  R.  If  A 
has  a  non-null  constraint  mapped  from  a  non-nuU  option  on  an  object  attribute  but 
B  does  not  have  such  a  non-null  constraint,  then  the  join  to  the  relation  R  must  be 
an  inner  join  for  the  non-null  constraint  on  A  to  be  satisfied  while  it  does  not  have 
to  be  an  inner  join  for  B.  In  this  case,  we  require  the  mechanism  to  make  sure  that 
no  null  value  of  A  is  retrieved,  even  if  it  also  prevents  nuU  value  of  B  from  being 
retrieved,  and  hence  determine  the  join  to  the  relation  i2  to  be  an  inner  join.  In 
other  words,  the  mechanism  enforces  the  semantics  of  non-null  options  more  strongly 
than  the  semantics  of  the  default  option,  which  allows  nulls.  We  call  this  correctness 
criterion  of  the  mechanism  as  a  non-null  correctness  criterion. 


3.3  System  Model 


The  system  model  has  three  elements:  an  object  type  model,  a  view  model,  and  a 
data  model.  The  object  type  model  defines  the  structure  of  objects.  No  object  type 
model  has  gained  universal  acceptance  [42,  43].  Therefore  we  define  a  model  which 
is  common  to  many  existing  object-oriented  models  [1,  6,  8,  4,  5].  Note  that  we  do 
not  deal  with  methods,  but  focus  only  on  object  structures.  The  data  model  uses  the 
relational  model  proposed  by  Codd  [26].  The  view  model  contains  a  relational  query^ 
and  defines  a  mapping  between  objects  and  rela.tions.  We  restrict  the  query  to  an 
acyclic  select-project -join  query  with  conjunctive  join  predicates. 


^We  do  not  assume  the  usage  of  any  specific  query  language  for  our  work. 
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(ssn)  (id)  Object  Employee 
(a)  A  pivot  relation  as  a  base  relation 


(ssn)  (id)  Object  Project-manager 


(b)  A  pivot  relation  as  a  derived  relation 
Figure  3.1:  The  concept  of  a  pivot  relation 

3.3.1  Object  Type  Model 

Many  existing  object-oriented  models  support  aggregation  through  nested  structures 
and  references.  For  example,  the  Employee  object  of  Figure  2.2  is  an  aggregation  of 
name,  dept,  and  children  where  dept  is  a  reference  to  a  Depeirtment  object,  and 
children  is  an  aggregation  of  neune  and  birthDate.  The  children  attribute  defines 
an  embedded  substructure  of  the  En^loyee  object.  Thus  our  object  type  has  a  similar 
structure  as  the  complex  object  [44,  45,  46]. 

We  use  value-oriented  object  id’s  [49,  50]  and  retrieve  them  from  the  keys  of 
relations^.  Those  relations  providing  object  id’s  are  called  pivot  relations  [16,  17]. 
As  discussed  in  Section  2.3,  an  object  is  mapped  semantically  to  a  derived  relation 
rather  than  a  base  relation  if  no  base  relation  provides  the  same  semantics  as  the 
object  type.  Figure  3.1  illustrates  these  concepts.  In  Figure  3.1a,  the  Employee 
relation  is  the  pivot  relation  for  the  Enployee  object  and  provides  its  key  ssn  as  the 
object  id.  Figure  3.1b  shows  the  derived  relation  Project -memager  of  Figure  2.1, 
which  becomes  the  pivot  relation  for  the  Project -manager  object.  It  is  defined  by 
Eiiq)loyee  X  Project,  and  the  key  ssn  of  Employee  in  the  join  result  is 
retrieved  as  the  object  id. 

We  do  not  consider  derived  attributes  for  our  object  type.  Derived  attributes  have 

identifiers  are  usable  as  well.  Otherwise  we  cissume  the  system  maintains  a  mapping 
between  system-generated  object  id’s  and  the  keys  of  the  corresponding  relations. 


Relation  Employee 

I 

ssntximanager-ssn 
Relation  Project 


Relation  Employee 
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no  direct  mapping  to  relation  attributes  and,  therefore,  are  computed  separately  from 
relation  attributes. 

An  object  type  is  defined  formally  as  a  tuple  of  attributes,  [Ai ,  Aa,  •  •  • ,  Xj ,  X2 ,  •  •  •] 
where  each  Ai  is  a  simple  attribute,  and  each  Xi  is  a  complex  attribute.  Each  attribute 
is  either  local  to  the  object  or  inherited  from  its  parent,  and  we  consider  both  the 
local  and  inherited  attributes  as  ‘defined’  in  an  object  type.  An  attribute  is  described 
in  Backus-Naur  Form  as  follows. 

'  attribute  simple  attribute  |  complex  attribute 
simple  attribute  internal  attribute  |  external  attribute 
.  complex  attribute  ::=  [  attribute,  attribute,  •••  ] 

A  simple  attribute  has  an  atomic  value  or  a  set  of  atomic  values.  It  is  either 
internal  or  external  to  the  object.  An  internal  attribute  has  a  primitive  data  type 
such  as  string,  integer,  etc.,  while  an  external  (or  reference)  attribute  has  another 
object  type  as  its  data  type.  The  value  of  an  external  attribute  is  the  oid  of  the 
referenced  object.  A  complex  attribute  defines  a  subobject  or  a  set  of  subobjects  by 
embedding  its  type  definition  within  the  object  type.  In  the  same  way  as  an  object 
id  is  mapped  from  the  key  of  a  pivot  relation,  a  subobject  also  has  an  associated  oid 
which  is  mapped  from  the  key  of  a  base  relation.  However,  the  oid  of  a  subobject  is 
not  retrieved  while  the  oid  of  its  (super)object  is  retrieved  from  the  key  of  a  pivot 
relation® . 

We  need  a  way  of  telling  the  system  whether  the  value  of  an  object  attribute  is 
allowed  to  be  null  or  not.  This  is  done  by  attaching  a  non-null  option  to  an  object 
attribute.  This  option  deliberately  declares  that  a  null  value  is  not  allowed  for  the 
attribute.  It  is  equivalent  to  specifying  the  constraint  of  ‘minimum  cardinality  >  0’ 
on  the  attribute®.  Attributes  without  non-null  options  are  allowed  to  have  null  values 
by  default. 

An  example  is  shown  in  Figure  3.2.  The  Project  attribute  defines  its  own  at¬ 
tributes  Eind  becomes  a  subobject  of  the  Programmer  object.  It  has  its  object  id 

®  A  subobject  of  an  object  is  not  a  stand-edone  object  because  it  has  no  object  id. 

®Many  commercial  tools  for  building  object-oriented  system  applications,  KEE[47,  48]  for  exam¬ 
ple,  support  this  option. 
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Type  Programmer 

[  name:  string  non-null,  dept:  Department  non-null,  salary:  integer, 
manager:  Employee,  task:  string. 

Project:  [  title:  string  non-null,  sponsor:  string,  leader:  string, 
depart:  Department  non- null  ]  ] 

Figure  3.2:  An  example  object  type 


oid  title  sponsor  leader  dept 


Figure  3.3:  The  0-tree  of  the  Programmer  object  type 

mapped  from  the  key  of  a  pivot  relation  in  the  same  way  the  Prograunmer  object 
does.  However,  only  the  id’s  of  the  Progreonmer  objects  axe  actually  retrieved.  This 
Programmer  object  example  will  be  used  throughout  the  rest  of  this  chapter. 

Given  an  object  type,  we  can  build  a  tree  consisting  of  its  object  attributes.  We 
call  such  a  tree  an  0-tree  and  define  it  as  follows. 

Definition  3.3.1  (O-tree)  The  0-tree  of  an  object  0  is  a  tree  which  has  the  fol¬ 
lowing  properties. 

•  Its  root  is  labeled  by  ‘O’. 

•  A  leaf  is  labeled  by  a  simple  attribute  of  the  object  O. 

•  An  intermediate  node  (non-leaf)  is  labeled  by  a  complex  attribute  of  the  object 

O. 

An  example  of  an  0-tree  is  shown  in  Figure  3.3  for  the  Progreonmer  type. 
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Here  we  introduce  two  functions  directly  derivable  from  an  object  type;  an  object 
set  (Oset)  and  an  object  chain  (Ochain).  These  two  functions  are  used  to  facibtate 
mapping  between  objects  and  relations. 

Definition  3.3.2  (Oset)  Given  an  object  0,  Oset(C>)  is  defined  as  a  function  re¬ 
turning  the  set  of  the  root  of  the  0-tree  and  all  of  its  non-leaf  descendents. 

For  example,  Oset(Progreunmer)  returns  {Programmer,  Project}.  Note  that  each 
element  of  an  Oset  has  its  object  id  mapped  to  the  key  of  a  pivot  relation. 

Definition  3.3.3  (Ochain)  Given  an  object  O  and  a  simple  attribute  so  of  the 
object  O,  0chain(0,so)  is  defined  as  a  function  returning  the  chain  of  nodes  from 
the  root  (0)  of  the  0-tree  to  a  descendent  node  labeled  so,  i.e.,  O.Oi.  •  •  •  .On.so- 

For  example,  Ochain(Programmer,  title)  returns  Programmer  .Project  .title  and 
Ochain(Programmer,  Project)  returns  Programmer. Project. 

3.3.2  Data  Model 

Integrity  constraints  [38,  39,  40]  are  a  part  of  the  data  model^.  Two  kinds  of  integrity 
constraints  axe  used  in  our  work:  referential  integrity  constraints  and  entity  integrity 
coststraints.  As  mentioned  in  Section  3. 2. 2. 3,  these  integrity  constraints  are  useful  to 
reduce  the  number  of  left  outer  joins  and  non-nuU  filters. 

The  referential  integrity  constraint  is  defined  as  follows. 

Definition  3.3.4  (Referential  integrity  constraint)  A  referential  integrity  con¬ 
straint  from  R.A  to  S.B  requires  that  if  R.A  is  not  null  then  there  exists  a  matching 
value  of  S.B.  That  is: 

Va  €  R.A{a  =  null  V  36  G  S.B{a  =  6))  (3-4) 

^In  the  Penguin  project,  which  was  introduced  in  Section  2.6.2. 1,  the  connections  of  a  struc¬ 
tural  data  model  provide  the  semantics  of  necessary  integrity  constraints,  and  therefore,  integrity 
constraints  need  not  be  specified  separately  by  a  database  designer. 
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Let  us  denote  the  referential  integrity  constraint  by  an  axrow  as  in  R.A  i-»  S.B. 

Our  definition  of  the  entity  integrity  constraint  is  more  extensive  than  the  defini¬ 
tion  used  in  [40]. 

Definition  3.3.5  (Entity  Integrity  constraint)  An  entity  integrity  constraint  re¬ 
quires  one  or  more  of  the  following  conditions  to  be  satisfied. 

•  Primary  key  constraint:  R.A  null  if  A  is  the  primary  key  of  i?®.. 

•  Range  constraint:  If  R.A  is  not  null  then  Ci^i  R.A  92a2  where  ai,  a2  are  non-null 
constants,  and  ^i,^2  are  or  '<’. 

•  Value  constraint:  R.A  =  a  oi  R.A  ^  a  where  a  is  a  constant  which  may  be  null. 

There  can  be  other  kinds  of  entity  integrity  constraint.  For  example,  R.A  can  have 
a  type  constraint  such  as  ‘the  value  of  R.A  must  be  an  integer’.  However,  those 
defined  in  Definition  3.3.5  are  sufficient  for  our  work.  Figure  3.4  shows  the  schema, 
the  referential  integrity  constraints  and  the  entity  integrity  constraints  of  a  sample 
database. 

3.3.3  View  Model 

Figure  3.5  shows  the  components  of  the  view  model.  A  view  consists  of  two 
parts:  a  query  part  and  a  mapping  part.  The  mapping  part  in  turn  consists  of  an 
attribute  mapping  function  (AMF)  and  a  pivot  description  (PD).  The  AMF  defines 
the  mapping  between  object  attributes  (So)  and  relation  attributes  (Sr).  The  PD 
consists  of  a  set  of  pivot  relations  (PS)  and  a  pivot  mapping  function  (PMF).  The 
PMF  defines  the  mapping  between  the  pivot  relations  and  the  (sub)objects®. 

A  high  level  language  can  be  designed  for  defining  a  view.  The  view  should  be 
preprocessed  to  generate  the  mapping  part  as  well  as  the  query. 

3.3. 3.1  Query  Part 


®In  [40],  only  this  constraint  is  used  as  the  entity  integrity  constraint. 

®Or  equivalently,  between  the  keys  of  the  pivot  relations  and  the  id’s  of  the  (sub)objects. 
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/*  Underlined  attributes  are  keys.  */ 

Division  fname.  manager,  super- division,  location) 

Deptfname,  budget,  phone#) 

Emp(ssn,  name,  salary,  dept) 

Engineer(ssn,  degree,  specialty) 

Proj-Assign(emp,  proj,  task) 

Project (proj#,  dept,  leader,  sponsor) 

Sponsor  (name,  phone#,  address) 

Proj-Title(proj# ,  title) 

(a)  Database  schema 

/*  i-»  denotes  a  referential  integrity  constraint.  */ 

Division. manager  Emp.name  Proj-Assign.emp  >  Engineer. ssn 

Division. super-division  Division.name  Proj-Assign.proj  •— »  Project. proj# 

Dept. name  Division.name  Project. dept  >  Dept.name 

Emp.dept  Dept.name  Project.leader  t— ♦  Emp.ssn 

Engineer.ssn  Emp.ssn  Project.sponsor  h-.  Sponsor.name 

Project-title.proj#  Project.proj# 


(b)  Referential  integrity  constraints 

The  keys  of  all  relations  shown  in  the  database  schema  are  disallowed  from  having 
nulls.  In  addition,  Emp.dept  zmd  Emp.name  are  prohibited  from  having  nulls. 

(c)  Entity  integrity  constraints 

Figure  3.4:  A  sample  database 
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PS:  the  set  of  pivots  Oset:  object  set  Ochain:  object  chain 
So:  the  set  of  Ochains  of  object  attributes  appearing  in  flie  object  type 
Sr:  the  set  of  relation  attributes  appearing  in  the  query 

Figure  3.5:  Mapping  between  objects  and  relations 


Programmerl 


(The  keys  of  Engineerl  and  Projectl  are  mapped  to  the  id’s  of  the  Programmer 
object  and  the  Project  subobject  respectively.  Dotted  lines  denote  pivots.) 

Figure  3.6:  The  query  graph  for  the  Programmer  object 
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Figure  3.6  shows  the  query  graph  for  the  Progreonmer  object.  A  query  graph  (QG) 
is  a  directed  connected  graph.  Each  vertex  is  represented  by  the  node  of  a  relation 
R  labeled  with  a  filter  /  and  with  the  set  of  attributes  tt  projected  from  R.  Two 
occurrences  of  the  same  relation  are  distinguished  by  a  tuple  variable  denoted  as  a 
subscript.  Each  edge  represents  a  join  specified  in  the  query.  A  join  is  either  an 
inner  join  or  a  left  outer  join.  Since  left  outer  joins  are  not  symmetric,  the  edges  are 
directed. 


3.3.3.2  Mapping  Part 

Now  we  give  a  more  rigorous  description  of  the  mapping  part.  The  set  of  object 
attributes  So  of  an  object  type  O  is  represented  as  the  set  of  Ochains  as  follows. 

So  =  {Ochain(C),so)|^o  €  Simple_attr(C})} 

where  Simple_attr(C>)  denotes  the  set  of  simple  attributes  of  an  object  type  O. 
Ochain(0,so)  defined  in  Definition  3.3.3.  The  set  of  relation  attributes  Sr  is 
defined  as'  follows. 

Sr  =  {R.A\A  C  Attr(i2)} 

where  H  is  a  relation  occurrence  in  the  query  part  of  a  view  and  Attr(i2)  denotes  the 
set  of  attributes  of  R. 

Since  we  assume  no  derived  attribute,  there  exists  a  one-to-one  mapping  between 
So  and  Sr.  This  mapping  information  is  conteiined  in  the  attribute  mapping  function. 
The  following  example  shows  the  mapping  between  the  So  and  Sr  of  the  Progrannaer 
object. 


Example  3.3.1  (Attribute  Mapping  Function  (AMF)) 
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Programmer .ncune  •<->  Empi.neune, 

Programmer .  dept  Eiiq>i .  dept , 

Programmer .  salary  Empi .  salaory , 

Programmer  .manager  Divisioni.manager , 

Programmer. task  Proj-Assignj .task, 

Programmer. Project. title  Pro j -Tit lej. title, 

Programmer. Project  .sponsor  •«-»  Sponsori  .name , 

Programmer .  Pro j  ect .  leader  Emp2 .  name , 

^  Programmer .  Pro  j  ect .  depart  Proj  ect  i .  dept 

As  shown  in  Figure  3.1,  a  pivot  relation  is  either  a  base  relation  or  a  derived  rela¬ 
tion.  If  it  is  a  base  relation,  its  key  is  mapped  to  the  object  id.  If  it  is  a  derived  rela¬ 
tion,  the  key  of  one  of  its  base  relations  is  mapped  to  the  object  id.  For  example,  the 
query  for  the  Programmer  object  has  two  pivot  relations,  Programmeri  and  Projecti. 
Here  Projecti  is  a  base  relation  and  Programmeri  is  a  derived  relation  defined  by 
(Engineer!,  {Engineen  ^T^ask  =  “programming-  Proj-Assigm}).  A 

formal  definition  of  a  derived  relation  is  as  follows. 

Definition  3.3.6  (Derived  relation)  A  derived  relation  of  an  object  type  0  is  an 
ordered  pair  {R^,,  E)  where  R\,  is  a  base  relation  whose  key  is  mapped  to  the  oid  of  the 
object  type  O,  and  E  is  a  select-join}°  expression  such  that,  for  arbitrary  instances 
of  the  relations  in  E: 

•  /  £  A  nKey,„.,£'  =  nKey,«.,£) 

That  is,  the  result  of  evaluating  E  produces  a  subset  of  the  keys  available  from  Ri, 
and  there  is  no  other  select-join  expression  which,  when  evaluated,  produces  the 
same  set  of  keys. 

For  every  object  and  its  subobject,  there  always  exists  one  and  only  one  relation 
occurrence  whose  key  is  mapped  to  the  oid.  In  other  words,  there  is  a  one-to-one 


^ ^Selection  is  not  required  while  join  is  required. 
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mapping  between  the  object  set  defined  in  Definition  3.3.2  and  the  set  of  pivot  re¬ 
lations  (PS).  This  mapping  information  is  contained  in  the  pivot  mapping  function. 
For  example,  the  mapping  between  the  Oset  and  PS  of  the  Programmer  object  is  as 
follows. 

Example  3.3.2  (Pivot  Mapping  Function  (PMF)) 

Programmer  Programmeri,  Project  •«->  Projecti 

□ 

As  mentioned  in  Section  3.3.1,  we  associate  value-oriented  object  id’s  with  an 
object  and  its  subobjects.  These  oid’s  are  invisible  in  the  type  definition  and  their 
mappings  to  relation  attributes  are  not  explicitly  specified  in  the  attribute  mapping 
function.  These  mappings  are  derived  from  the  information  stored  in  the  pivot  de¬ 
scription  using  the  following  algorithm. 

Algorithm  3.3.1  (Mapping  between  oid’s  and  relation  attributes) 

Input:  Ochain,  AMF  without  the  mapping  of  oid’s,  PS,  PMF. 

Output:  AMF  with  the  mapping  of  oid’s. 

For  each  pivot  relation  p  G  PS  begin 
If  p  is  a  base  relation 

then  append  ‘Ochain(C>,  PMF(p)).id  p.Key(p)’  to  AMF. 
else  /*  p  is  a  derived  relation  */  begin 
Find  the  base  relation  Rb  of  p. 

Append  ‘0chain(0,  PMF(p)).id'«-»  i2fc.Key(i2i,)’  to  AMF. 
end. 
end. 

For  example,  given  the  set  of  pivot  relations  and  the  pivot  mapping  function  of  the 
Programmer  view,  Algorithm  3.3.1  derives  the  following  mappings  between  the  id’s 
of  the  Programmer  object  and  its  Project  subobject  and  their  corresponding  pivot 
relation  keys. 

Example  3.3.3  (Addition  to  AMF) 

Programmer. id  •<-»  Engineeri . ssn, 


30 


CHAPTER  3.  O  UTER  JOINS  AND  FILTERS  IN  A  VIEW-  Q  VERY 


Programmer. Project  .id  Projecti  .proj# 

□ 

These  are  appended  to  the  AMF. 

There  is  a  constraint  on  the  definition  of  the  attribute  mapping  function.  Let 
us  consider  two  object  attributes  sq  and  sj  which  belong  to  the  same  node  of  an 
0-tree  and  their  mapped  relation  attributes  AMF(so)  and  AMF(si).  Then  AMF(so) 
and  AMF(si)  must  either  belong  to  the  same  relation  or  there  exists  a  one-to-one 
cardinality  relationship  between  them. 

The  attribute  mapping  function  is  essential  for  mahing  it  simple  to  map  between 
objects  and  relations,  as  will  be  demonstrated  in  the  following  section. 

3.4  Development  of  the  Mechanism 

Now  we  describe  the  mechanism  for  prescribing  joins  in  a  query  as  inner  joins  or  left 
outer  joins,  and  also  for  generating  non-null  filters  for  some  relations  in  the  query. 
We  first  present  an  overview  of  our  mechanism,  and  then  discuss  each  step  in  detail. 

3.4.1  Overview 

There  are  two  sources  of  nulls  retrieved  from  databases.  One  is  from  the  nulls  stored 
in  the  tuples,  the  other  is  from  any  outer  join  failure.  Inner  joins  create  nulls  from 
the  first  source  only,  while  outer  joins  create  nulls  from  both  sources.  Objects  allow 
nulls  by  default,  and  need  only  one  kind  of  outer  join,  a  left  outer  join,  as  explained  in 
Section  3. 2.2.1.  Therefore  our  strategy  is  to  initialize  all  joins  of  a  query  as  left  outer 
joins  and  then  replace  part  of  them  by  inner  joins  at  each  step  of  our  mechanism. 
The  steps  of  our  mechanism  is  as  follows. 

1.  Compile  the  object  type  O  and  generate  the  object  set  (Oset)  and  the  set  of 
0chain(0,  so)’s  for  aH  the  attributes  defined  in  0. 

2.  Preprocess  the  view  and  generate  the  query  and  the  mapping  part  -  AMF, 
PMF,  and  PS. 
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3.  Derive  the  mappings  between  object  id’s  and  the  keys  of  pivot  relations  using 
Algorithm  3.3.1,  and  add  the  result  to  the  attribute  mapping  function. 

4.  Initiahze  all  joins  of  the  query  as  left  outer  joins. 

5.  Replace  all  joins  within  derived  relations  by  inner  joins.  (See  Section  3.4.2.) 

6.  Map  non-nuU  options  on  object  attributes  to  non-null  constraints  on  the  query 
result.  Replace  some  joins  by  inner  joins  and  add  non-nuU  filters  to  some 
relations  accordingly.  (See  Section  3.4.3  and  Section  3.4.4.) 

7.  Find  the  left  outer  joins  which  produce  the  same  tuples  as  inner  joins  due  to 
referential  or  entitj'  integrity  constraints,  and  replace  those  left  outer  joins  by 
inner  joins.  Find  also  the  relations  whose  non-null  filtered  attributes  cannot 
have  nulls  due  to  entity  integrity  constraints,  and  remove  the  non-null  filters 
from  those  relations.  (See  Section  3.4.5.) 

3.4.2  Joins  within  a  Derived  Relation 

As  mentioned  in  Section  2.3,  a  derived  relation  is  a  conceptual  relation  defined  from 
base  relations  via  a  select-join  expression,  and  provides  an  abstraction  of  base  relations 
so  that  the  semantics  of  the  derived  relation  matches  the  semantics  of  the  instantiated 
objects. 

AH  joins  specified  within  a  derived  relation  must  be  inner  joins,  as  shown  by  the 
following  theorem. 

Theorem  3.4.1  Let  us  consider  an  object  type  0  and  a  derived  relation  {Ri,E) 
defined  according  to  Definition  3.3.6.  If  =  jRi  tx  7^2  Pm,  then  all  the  joins 

from  Ri  through  Rn  are  inner  joins. 

Proof:  If  we  assume  a  join  from  Ri  to  iZi+i  is  a  left  outer  join  for  an  arbitrary 
i  €  [l,n,]  while  the  others  are  inner  joins,  then  the  following  is  true. 

•••  Xli^ilX  Ri+iX---  XRri) 

=  nKey(fl,)(i2iX^2X.--XRi) 


(3.5) 
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That  is,  there  exists  another  select-join  expression  which,  when  evaluated,  produces 
the  same  set  of  keys  available  from  Ri-  This  violates  the  second  condition  required 
of  E  in  Definition  3.3.6.  Therefore,  all  the  joins  in  E  must  be  inner  joins.  Q.E.D. 

For  example,  given  a  derived  relation: 

(Engineer! .  {Engineer:  <^task= 'programming'  Proj-Assigm}) 

which  is  defined  to  provide  the  semantics  of  the  Programmer  object,  the  join  between 
Engineer:  and  Pro j -Assign:  niust  be  an  inner  join.  If  the  join  is  evaluated  as  a 
left  outer  join,  it  retrieves  all  tuples  of  Engineer:,  not  just  those  corresponding  to 
programmers,  who  are  defined  as  the  engineers  working  on  a  programming  task  in 
the  assigned  projects. 

Thus,  given  the  set  PS  of  pivot  relations,  we  have  the  following  algorithm. 
Algorithm  3.4.1  (Joins  within  derived  relations) 

Input:  query  graph  (QG)  with  all  joins  initialized  as  left  outer  joins,  and  the  pivot 
set  (PS). 

Output:  query  graph  (QG')  modified  with  inner  joins. 

1.  For  each  derived  relation  {Rb,  E)  in  the  set  of  pivot  relations  (PS), 
replace  all  joins  in  E  by  inner  joins. 

3.4.3  Mapping  Non- null  Options  to  Non- null  Constraints 
on  the  Query  Result 

Let  us  consider  an  object  0  whose  attribute  sq  has  a  non-nuU  option.  The  non-null 
option  requires  that  there  should  exist  a  non-nuU  so  given  the  oid  of  the  object.  Let  us 
denote  this  non-null  constraint  as  O.id  sq.  If  so  is  a  simple  attribute,  it  is  non-null 
if  its  value  is  not  null.  On  the  other  hand  if  so  is  a  complex  attribute,  it  defines  a 
subobject.  An  object  is  non-null  only  if  its  oid  is  non-null.  We  thus  interpret  the 
semantics  of  non-nuU  so  according  to  the  following  rule  of  non-nuU  constraint. 

Rule  3.4.1  (Non-null  constraint)  Let  us  consider  Ochain(0,So)  =  Oo-Oi.  •  ■  •  .0„.So 
If  So  has  a  non-nuU  option  then,  given  0„.id, 

•  If  So  is  a  simple  attribute,  i.e.,  On-id  =^>  So,  then  So  cannot  be  null. 


3.4.  DEVELOPMENT  OF  THE  MECHANISM 


33 


•  If  5o  is  a  complex  attribute,  i.e.,  (9„.id  so-id,  then  so-id  cannot  be  nuU. 

For  example,  given  the  Programmer  object  of  Figure  3.2,  the  non-null  options  on  name 
and  dept  attributes  are  interpreted  as  Programmer .  id  name  and  Programmer .  id 
=^>  dept,  respectively,  because  name  and  dept  are  simple  attributes.  Besides,  the 
non- null  options  on  title  and  depart  are  interpreted  as  Project. id  ^  title 
and  Project. id  =>  depeirt,  respectively.  Beware  that  they  are  not  interpreted  as 
Programmer,  id  title  and  Programmer  .id  =>  depart  because  title  and  depart 
are  the  (direct)  attributes  of  Project  subobject  instead  of  the  Programmer  object. 
On  the  other  hand,  if  there  were  a  non-nuU  option  on  Pro  j  ect ,  it  would  be  interpreted 
as  Programmer . id  =>  Project  .id  because  Project  is  a  complex  attribute. 

Can  we  map  the  non-nuU  constraint  defined  by  Rule  3.4.1  to  the  corresponding 
non-nuU  constraint  on  the  query  result?  This  is  possible  in  our  model  because  the 
oid  of  each  (sub)object  always  has  a  corresponding  pivot  relation  key.  The  attribute 
mapping  function  in  Example  3.3.1  showed  this  correspondence  for  the  Prograimner 
object.  Using  the  correspondence,  the  non-nuU  constraints  on  the  name  and  dept 
attributes  of  the  Progreunmer  object  axe  mapped  to  Engineeri .  ssn  Empi  .name 
and  Engineeri . ssn  =>  En5>i.dept,  respectively.  Likewise,  if  Project  had  the  non- 
Tiiill  option,  its  constraint  would  be  mapped  to  Engineeri. ssn  =>  Projecti.proj#. 
The  non-nuU  option  on  the  title  attribute  is  mapped  not  to  Engineeri. ssn  ^ 
Pro j-Titlei. title  but  to  Project i .pro j#  Proj-Titlei. title  because  title 
is  defined  not  as  an  attribute  of  Programmer  object  but  as  an  attribute  of  Project 
subobject.  For  the  same  reason,  the  non-nuU  option  on  the  depaort  attribute  of 
Project  is  mapped  to  Projecti.proj#  =>  Projecti.dept. 

More  formaUy,  a  non-nuU  option  on  the  attribute  sq  of. an  object  type  0  is  trans¬ 
lated  into  the  non-nuU  constraint  on  the  query  result  as  foUows. 

Algorithm  3.4.2  (Mapping  non-null  options) 

Input:  sm  object  attribute  sq  with  a  non-nuU  option,  attribute  mapping  function 
(AMF),  object  chain  (Ochain). 

Output:  a  non-nuU  constraint  on  the  query  result. 

1.  flo,n  -So  :=  Ochain(0,  So)  =  Oo-Oi.  •  •  •  .O„.so- 
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2.  Rp.A  :=  AMF(fio,n-id).  /*  A  is  always  the  key  of  Rp.  */ 

3.  If  So  is  a  simple  attribute 

then  Rg.B  :=  AMF(no,n-so) 

else  R,.B  ;=  AMF(no,n-so-id).  /*  If  sq  is  a  complex  attribute,  B  is  the  key  of 

R..  V 

4.  Output  the  constraint  ^Rp.A  R,.B\ 

3.4.4  Prescribing  Joins  and  Generating  Non- null  Filters 

With  the  non-null  constraints  on  the  query  result,  we  translate  them  into  the  corre¬ 
sponding  inner  joins  and  non-nuU  filters  of  the  query.  Given  the  constraint  ‘Rp.A 
R,.B'  obtained  from  Algorithm  3.4.2,  it  is  done  as  follows. 

Algorithm  3.4.3  (Translating  non-null  constraints) 

Input:  query  graph  QG'  from  Algorithm  3.4.1,  a  non-null  constraint  Rp.A  R^.B 
Output:  query  graph  QG"  modified  with  inner  joins  and  a  non-nuU  filter. 

1.  Replace  the  filter  /,  on  R,  by  /,  A  (R  ^  null).  /*  Generate  a  non-null  filter.  */ 

2.  /*  Prescribe  a  join.  */ 

(a)  Find  all  directed  join  paths  from  Rp  io  R,. 

(b)  For  each  path  found  in  Step  2a, 

replace  all  joins  on  the  path  by  inner  joins. 

For  example,  given  the  non-nuU  constraints  established  in  Section  3.4.3,  the  follow¬ 
ing  non- null  filters  are  generated  in  the  query  of  the  Programmer  object:  En5>i  .name  ^ 
null,  En5)i.dept  ^  null,  Pro jecti. dept  ^  null,  Proj-Titlei .title  ^  null 
Besides,  the  following  left  outer  joins  axe  replaced  by  inner  joins:  Engineeri  [X 
En^)!,  ProjectilX  Proj-Titlei. 

Now  we  prove  the  correctness  of  Algorithm  3.4.3  with  the  following  theorem,  base 
on  non-null  the  correctness  criterion  explained  in  Section  3.2.4. 
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Theorem  3.4.2  Given  a  join  path  iJi  IX  i22  ^  -Kn  and  a  non-null  .constraint 

Ri.Ai  =^>  Rn-An  on  the  join  result,  the  materiahzed  join  result  satisfies  this  non-null 
constraint  if  and  only  if  all  the  joins  are  inner  joins  and  R^  is  filtered  by  An  ^  null. 

Proof: 

If  part:  If  all  joins  on  the  join  path  are  inner  joins,  any  nonmatching  tuples  are 
discarded.  Then,  the  attribute  An  in  the  join  result  can  have  nuDs  only  if is  not  a 
join  attribute  and  some  tuples  of  Rn  have  null  .4„.  (If  it  is  a  join  attribute,  any  tuple 
of  Rn  with  null  An  is  discarded  by  an  inner  join.)  However,  tuples  with  null  An  are 
removed  from  Rn  by  the  given  non-null  filter.  Therefore  the  constraint  is  satisfied. 
Only  if  part:  We  prove  this  part  by  contradiction.  Let  us  first  assume  Ri  IX  Ri^i 
is  a  left  outer  join  for  some  i  although  the  constraint  is  satisfied  and  let  Ri+i  have 
non-matching  tuples.  Then  a  null  Rn.An  is  retrieved  from  the  nuD  tuples  appended  to 
the  tuples  of  Ri  which  have  no  matching  tuples  in  Ri+i.  This  contradicts  the  assumed 
constraint.  Therefore  all  the  joins  must  be  inner  joins.  Next,  let  us  assume  Rn  is 
not  filtered  by  An  ^  null  although  the  constraint  is  satisfied  and  all  joins  are  inner 
joins.  Then  null  Rn.An  is  retrieved  from  the  nulls  stored  in  Rn.An  if  An  is  not  a  join 
attribute.  This  contradicts  the  assumed  constraint.  Q.E.D. 


3.4.5  Reducing  the  Number  of  Left  Outer  Joins  and  Non¬ 
null  Filters 

We  can  remove  unnecessary  non-null  filters  and  further  reduce  the  number  of  left 
outer  joins  by  using  integrity  constraints. 

3.4. 5.1  Removing  Unnecessary  Non-null  Filters 

Considering  entity  integrity  constraints,  some  non-nuU  filters  are  removed  if  they  are 
defined  on  attributes  which  cannot  have  niill.  A  typical  case  is  when  the  attribute  is  a 
key  (primary  key  constraint)  or  any  other  non-nuU  attribute  designated  in  the  schema 
definition  (value  constraint).  For  example,  we  can  remove  Empi.name  ^  null  and 
En5>i.dept  null  among  the  four  non- null  constraints  generated  in  Section  3.4.4 
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because,  as  it  was  shown  in  Figure  3.4c,  those  two  attributes  are  key  attributes  and 
hence  prohibited  from  having  nulls. 

3.4. 5.2  Further  Reducing  the  Number  of  Left  Outer  Joins 

We  can  also  replace  some  left  outer  equijoins  by  inner  equijoins  if  we  consider  refer¬ 
ential  integrity  constraints.  Since  a  referential  integrity  R.A  S.B  allows  R.A  to  be 
nuU,  we  define  a  stronger  condition  by  introducing  a  variable  min  as  follows. 

Definition  3.4.1  (min)  Given  a  join  Ri  txl  Rj,  let  minjj  denote  the  minimum  number 
of  matching  tuples  in  Rj  for  each  tuple  in  Ri.  Note  minij  is  not  necessarily  the  same 
as  minji. 

Besides,  some  left  outer  non-equijoins  can  be  replaced  by  inner  non-equijoins  if  we 
consider  entity  intergrity  constraints  such  as  range  constraints. 

Using  only  the  semantics  of  min  without  considering  the  instances  of  relations^^ 
we  define  the  following  rules  for  deciding  whether  min  is  greater  than  zero  or  not. 
MIN(i2.>l)  denotes  the  minimum  non-null  value  allowed  for  R.A,  and  MAX{R.A) 
denotes  the  maximum  non-null  value  allowed  for  R.A.  MIN(iE.>l)  and  MAX(i2.74) 
are  known  from  the  range  constraints  or  value  constraints,  if  there  are  any,  on  R.A. 

Rule  3.4.2 

•  Given  a  single  join  predicate  AdB  for  the  join  between  two  relations  Ri  and 
Rj,  min^j  >  0  if  Rj.A  is  a  non-null  attribute  and  one  or  more  of  the  following 
conditions  are  satisfied. 

G  =  and  Ri.A  Rj.B  and  the  filter  fj  on  Rj  is  empty,  or 
e  =  *>’  and  M.lN{Ri.A)  >  UAX{Rj.B),  or 
e  =  ‘>’  and  mN{Ri.A)  >  MAX{Rj.B),  or 
e  =  ‘<’  and  MAX{Ri.A)  <  Mm{Rj.B),  or 
e  =  ‘<’  and  MAX(Ri.A)  <  MIN(R,-.B),  or 

other  words,  we  ignore  the  fact  that  min  may  be  accidentally  greater  than  zero  at  the  instance 
level  although  it  is  judged  to  be  equal  to  zero  at  the  semantic  level. 
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G  =  V’  and  (MIN(i2i.yl)  >  MAX{Rj.B)  or  MAX(i2i.^)  <  MIN(i2,-.B)). 

Otherwise  minij  =  0^^. 

•  Given  a  conjunctive  join  predicate  AiGiBi  A  A2G2B2  A  •  •  •  A  AkOkBk  for  the  join 
between  Ri  and  Rj,  itiinij  >  0  for  the  conjunction  of  join  predicates  if  min,j  >  0 
for  every  single  join  predicate.  Otherwise  inin,j  =  0. 

•  Given  a  disjunctive  join  predicate  AiGiBi  V  A2G2B2  V  •  •  •  V  AkGkBk  for  the  join 
between  Ri  and  Rj,  niin,j  >  0  for  the  disjunction  of  join  predicates  if  mini,-  >  0 
for  at  least  one  join  predicate.  Otherwise  minij  =  0. 

•  Given  a  join  path  between  two  relations,  such  as  il*  M  Ri+i  M  •  •  •  tx  Rj, 
mini,-  >  0  if  mink,k+i  >  0  for  =  i,  •  •  • ,  j  -  1.  Otherwise  mini,  =  0. 

If  mini,  >  0  for  a  join  path  from  Ri  though  Rj,  we  can  replace  all  joins  on  the 
path  by  inner  joins  and  still  get  the  same  query  resiilt. 

Now  we  describe  an  algorithm  for  reducing  the  number  of  left  outer  joins  using 
min.  We  also  show  the  step  of  removing  unnecessary  non-null  filters  in  the  following 
algorithm. 

Algorithm  3.4.4  (Integrity-based  reduction  of  left  outer  joins  and  non-null  filters) 

Input;  query  graph  (QG*^)  produced  by  Algorithm  3.4.3,  and  integrity  constraints. 

Output:  modified  query  graph  (QG"'). 

1.  Remove  ‘R.A  ^  nuU’  such  that  A  is  a  non-nuU  attribute. 

2.  Find  all  join  paths  between  pairs  of  nodes,  such  as  Ri  and  Rj,  whose  minj,-  >  0. 

3.  For  each  join  path  found  in  Step  1, 

replace  all  joins  on  the  path  with  inner  joins. 

=  0  does  not  mean  that  mint,'  is  always  equal  to  zero.  Rather,  it  means  that  it  is  not 
known  at  the  semantic  level  whether  mini,  is  greater  than  zero. 
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As  an  example,  in  the  query  of  Programmer  object,  we  find  a  join  path  from 
Engineeri  to  Divisioni  for  which  all  three  joins  have  min  >  0.  This  is  because,  as 
shown  in  Figure  3.4,  (1)  there  are  referential  integrities  Engineeri  •  ssn  En^)! .  ssn, 
Empi.dept  Depti  .name,  Depti  .neune  i— >  Divisioni  .name,  (2)  there  are  integrity 
constraints  prohibiting  nulls  for  Engineeri. ssn,  Empi.dept,  and  Depti.name,  and 
(3)  none  of  the  relations  on  the  join  path  has  a  non-empty  filter.  We  also  find  a 
join  path  from  Proj-Assigni  to  Projecti  for  which  the  min  >  0.  All  these  joins 
are  replaced  by  inner  joins.  Note  Projecti  !XI  £11:5)2  and  Project i  [X  Sponsori 
cannot  be  replaced  by  inner  joins  because  Project  .leader  and  Project  .sponsor 
are  not  non-null  attributes. 

3.4.6  Summary  of  the  Mechanism 

Given  a  query  with  initial  left  outer  joins,  the  overall  mechanism  developed  in  Sec¬ 
tion  3.4  is  as  follows. 

Algorithm  3.4.5  (Summary) 

Input:  object  type  O,  view  (query  part  and  mapping  part),  relations  and  integrity 
constratins. 

Output;  the  query  part  prescribed  with  inner  joins,  left  outer  joins,  and  non-null 
filters. 

1.  /*  Preprocessing  */ 

(a)  Compile  the  object  type  O  and  generate  the  object  set  (Oset)  and  the  set 
of  0chain(0,  so)’s  for  all  the  attributes  defined  in  0. 

(b)  Generate  the  query  and  the  mapping  part,  AMF,  PMF,  and  PS,  from  the 
view. 

(c)  Derive  the  mappings  between  object  id’s  and  the  keys  of  pivot  relations 
using  Algorithm  3.3.1,  and  add  the  result  to  the  attribute  mapping  func¬ 
tion. 

(d)  Initialize  all  joins  of  the  query  as  left  outer  joins. 
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2.  I*  Replace  all  joins  within  derived  relations  with  inner  joins.  */ 

For  each  derived  relation  {Rb,E)  in  the  set  of  pivot  relations  (PS), 

replace  all  joins  in  E  by  inner  joins. 

3.  For  each  attribute  so  of  object  0  that  has  a  non-null  option, 

(a)  /*  Map  the  non-null  option  to  a  non-null  constraint  on  the  query  result  */ 

i.  f2o,n-5o  ~  Ochain((9,  So)  =  Oo-Oi.  •  •  •  .On. so¬ 
il.  Rp.A  :=  AMF(no,n.i<i).  /*  A  is  always  the  key  of  Rp.  */ 

iii.  If  So  is  a  simple  attribute 

then  Rg.B  :=  AMF(no,n-so) 

else  Rs.B  :=  AMF(fio,n-So.id).  /*  If  so  is  a  complex  attribute,  B  is 
the  key  of  R,.  *  j 

iv.  Output  the  non-null  constraint  ''Rp.A  R,.B\ 

(b)  /*  Generate  a  non-null  filter  and  prescribe  a  join.  */ 

i.  Replace  the  filter  f,  on  il,  by  /,  A  (B  7^  null).  /*  Generate  a  non-null 
filter.  */ 

ii.  /*  Prescribe  a  join.  */ 

A.  Find  all  directed  join  paths  from  Rp  to  R,. 

B.  For  each  path  found  in'  Step  3(b)iiA, 

replace  all  joins  on  the  path  by  inner  joins. 

4.  /*  Using  the  integrity  constraint,  remove  all  non-null  filters  which  can  be  shown 
to  be  redundant,  and  replace  left  outer  joins  if  they  prove  to  be  equivalent  to 
inner  joins.  */ 

(a)  Remove  'R.A  ^  nuU’  such  that  A  is  a  non-null  attribute. 

(b)  Find  all  join  paths  between  pairs  of  nodes,  such  as  Ri  and  Rj,  whose 
miixij  >  0. 

(c)  For  each  join  path  found  in  Step  4b, 

replace  all  joins  on  the  path  with  inner  joins. 
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Programmer! 


Figure  3.7:  The  query  graph  for  the  Prograinmer  object  with  joins  and  non- null  filters 

The  graph  of  the  query  for  the  Programmer  object,  labeled  with  joins  and  non¬ 
null  filters,  is  shown  in  Figure  3.7.  AU  the  joins  of  the  query  except  those  between 
Project^  and  Emp2  and  between  Projecti  and  Sponsori  have  been  prescribed  as 
inner  joins.  Two  non- null  filters  have  been  attached  as  the  selection  conditions  on 
the  Project  1  and  Proj-Titlei  nodes. 


3.5  Summary 

We  developed  a  mechanism  for  automatically  prescribing  inner  or  left  outer  joins 
for  the  joins  of  a  query  used  to  instantiate  objects  from  a  relational  database.  It 
also  generates  non-nuU  filters  for  some  of  the  relations  in  the  query.  We  developed 
a  rigorous  system  model  that  facilitates  the  mapping  between  objects  and  relations. 
The  system  model  consists  of  an  object  type  model,  a  view  model,  and  a  relational 
data  model.  These  models  are  based  on  a  standard  model  or  weD-known  models. 
We  added  a  few  new  components  to  the  object  type  model  and  view  model.  These 
components  are  easily  implementable  in  existing  systems. 

The  mechanism  deals  with  an  acyclic  query  graph.  At  first,  eJI  joins  in  the  query 
graph  are  initialized  to  left  outer  joins.  Then  the  following  joins  are  replaced  by  inner 
joins:  1.  Joins  within  a  derived  relation  to  provide  the  semantics  of  the  object  type; 
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2.  a// joins  lying  on  the  path  from  a  pivot  key  to  an  attribute  which  has  a  non-null 
constraint  on  the  query  result.  The  non-null  constraint  is  mapped  from  a  non-null 
option  on  an  object  attribute  and  enforces  the  semantics  of  the  non-null  option  at  the 
relation  level.  Besides,  non- null  filters  are  generated  for  a  relation  attribute  which 
has  non-null  constraints  on  it.  Finally,  the  number  of  left  outer  joins  and  non-null 
filters  is  reduced  whenever  possible  using  the  integrity  constraints  so  that  the  query 
is  processed  more  efficiently. 

Our  result  demonstrates  how  simple  the  mechanism  becomes  once  the  system 
model  is  established.  The  only  criterion  for  the  mechanism  to  use  is  the  non-nub 
option  on  object  attributes,  whose  semantics  is  mapped  to  the  non-null  constraint  on 
the  query  result.  The  system  uses  the  non-nuU  correctness  criterion  to  make  sure  that 
the  semantics  of  a  non-null  option  is  preserved,  even  if  it  prevents  other  attributes 
without  non-null  options  from  having  null  values  as  well.  The  developed  mechanism 
covers  all  possible  cases  under  the  non-null  correctness  criterion. 


Chapter  4 

Efficiently  Instantiating  Objects 


4.1  Introduction 

This  chapter  addresses  the  second  problem  of  this  thesis,  which  is  to  improve  the  per¬ 
formance  of  retrieving  objects  from  a  relational  database  residing  on  a  remote  server. 
The  key  idea  of  the  performance  improvement  is  to  reduce  the  amount  of  redundant 
data  that  the  system  should  handle  in  order  to  instantiate  objects.  We  first  formu¬ 
late  our  concrete  problem  in  Section  4.2.  Secondly,  we  develop  three  different  object 
instantiation  methods  in  Section  4.3.  One  is  the  conventional  method  of  retrieving 
a  query  result  in  the  form  of  a  single  fiat  relation  (table).  The  other  two  methods 
retrieve  a  query  result  in  structures  that  are  different  from,  and  less  redundant  than, 
a  single  fiat  relation.  Thirdly,  we  develop  the  cost  models  of  the  three  different  object 
instantiation  methods  in  Section  4.4,  and  compare  their  costs  in  Section  4.5.  It  is 
followed  by  a  summary  of  this  chapter  and  a  discussion  of  future  work  in  Section  4.6. 


4.2  Problem  Formulation 

4.2.1  Environment:  a  Remote  Main  Memory  Dta-tabase  Server 

The  client-server  architecture  is  becoming  a  standard  architecture  of  modern  comput¬ 
ing  environment  by  virtue  of  the  recent  development  of  high-speed  computer  network 
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technology.  Typically  multiple  clients  and  servers  work  in  a  request-response  mode, 
that  is,  clients  make  requests  to  servers  and  servers  make  responses  to  the  clients. 
The  concept  of  a  remote  database  server  stemmed  from  this  concept,  where  requests 
are  database  queries  and  responses  are  query  results. 


In  the  earlier  stage  of  the  chent-server  architecture,  network  communication  cost 
has  been  a  major  performance  concern  in  accessing  a  remote  database  server.  How¬ 
ever,  this  concern  is  becoming  less  meaningful  these  days  because  the  communication 
cost  decreases  rapidly  since  the  advent  of  the  high-speed  computer  network  technol¬ 
ogy  such  as  Ethernet  local  area  network  (LAN)  [66,  65]  and  NSF  wide  area  network 
(WAN)  [67,  68).  Rather,  the  dominant  cost  of  accessing  a  remote  database  server 
is  the  cost  of  query  materialization  on  the  server,  disk  access  cost  in  particular. 
This  statement  is  true  as  long  as  databases  reside  on  a  secondary  storage  device. 
Nowadays  however,  the  number  of  applications  running  on  main  memory  databases 
[78,  73,  74,  81,  75]  is  increasing  as  high  density  main  memory  chips  are  becoming 
available  at  a  lower  cost.  Here,  a  main  memory  database  indicates  that  the  entire 
database  or  an  actively  used  subset  of  a  database  fits  within  main  memory  at  the 
same  time.  (According  to  [62],  “approximately  50  -  75  %  of  all  disk  accesses  occur 
on  data  stored  on  2  -  3  %  of  the  disk  media”.)  If  a  main  memory  database  is  used, 
the  disk  access  cost  disappeaxs  or  is  incurred  r^ely,  and  hence  the  CPU  cost  and  the 
network  communication  cost  become  dominant. 


Considering  all  these  facts,  our  work  assumes  the  environment  in  which  cbents 
access  remote  database  servers  where  databases  are  relational  and  stored  in  main 
memory.  We  assume  a  situation  in  which  practically  infinite  main  memory  is  available 
so  that  no  disk  access  is  necessary  at  all  during  the  entire  object  instantiation  process. 
Here  we  emphasize  that  we  assume  the  availability  of  large  main  memory  as  the 
environment  which  can  benefit  most  from  our  work.  The  usefulness  of  the  result  of 
our  work  is  not  restricted  to  main  memory  database  systems. 


44 


CHAPTER  4.  EFFICIENTLY  INSTANTIATING  OBJECTS 


(b)  Tuples  of  relation  fragmeaits  (c)  Tuples  of  a  single  flat  relation 


Figure  4.1;  Duplicate  subtuples 

4.2.2  Motivation:  Redundant  Subtuples  of  a  Single  Flat 
Relation 

Tbere  are  two  kinds  of  redundant  subtuples  in  the  composite  tuples  of  a  query  result 
retrieved  in  the  form  of  a  single  flat  relation:  duplicate  subtuples  and  null  subtuples. 

A  single  flat  relation  contains  dupHcate  subtuples  among  the  composite  tuples. 
For  example,  let  us  consider  a  query  whose  join  graph  is  shown  in  Figure  4.1a.  Fig¬ 
ure  4.1b  shows  flve  matching  tuples  rj,  i  =  1, 2,  •  •  • ,  5  from  R2  and  five  matching  tuples 
~  1)2,  •••,5  from  R3  for  a  tuple  from  Ri.  Once  those  matching  tuples  are 
concatenated  into  composite  tuples  of  Figure  4.1c,  ti  is  duplicated  25  times  and  each 
of  r,-,  i  =  1, 2,  •  •  • ,  5  are  duplicated  5  times,  just  to  make  the  query  result  ‘flat’. 

As  we  discussed  in  Chapter  3,  left  outer  joins  are  frequently  needed  to  instantiate 
objects  from  relational  databases  through  views  [63].  If  there  are  outer  joins  in 
the  query,  the  materiahzed  single  flat  relation  will  contain  null  subtuples  for  any 
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(a)  Join  graph 
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(b)  Tuples  of  relation  fragments 
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(c)  Tuples  of  a  single  flat  relation 


Figure  4.2:  Null  subtuples 

nonmatching  subtuples.  Figure  4.2  shows  an  example  case.  Given  a  left  outer  join 
from  Ri  to  i?2,  there  is  only  one  matching  tuple  ri  from  R2.  However,  the  semantics  of 
the  left  outer  join  requires  that  null  tuples  should  be  inserted  in  place  of  R2  tuples  for 
any  dangling  tuples^  of  iZj.  Therefore,  there  appear  nine  null  subtuples  in  Figure  4.2c. 
The  amount  of  null  subtuples  become  significant  if  an  outer  join  is  followed  by  other 
(inner  or  outer)  joins. 

Those  dupHcate  subtuples  and  null  subtuples  are  inserted  just  to  make  the  rela¬ 
tion  flat  and  do  not  carry  any  additional  information.  Moreover,  they  cause  some 
disadvantages  compared  to  the  case  they  axe  not  materialized.  First  of  all,  redundant 
subtuples  incur  the  cost  of  materializing  them.  Besides,  redundant  subtuples  increase 
the  amount  of  transmitted  .data  and  thus  increase  the  communication  cost  over  net¬ 
work  without  conveying  any  more  information.  Furthermore,  the  flat  relation  must 
be  restructured  into  a  nested  relation  to  become  usable  as  objects.  In  other  words, 
we  materialize  and  transmit  redundant  subtuples  which  are  destined  to  be  eliminated 
in  a  restructuring  process.  Thus,  the  duplicate  subtuples  and  null  subtuples  incur 
the  additional  cost  of  materializing  them,  transmitting  them,  and  ehminating  them 


^That  is,  tuples  which  do  not  have  any  matching  tuples  in  R2. 
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without  any  benefit  compensating  for  these  costs.  This  observation  motivated  us  to 
look  for  alternative  methods  which  do  not  return  a  single  flat  relation  as  a  query 
result  and  thus  enable  us  to  avoid  retrieving  duplicate  or  null  subtuples. 

4.2.3  Problem  Statements 

Our  purpose  is  to  develop  alternative  methods  of  instantiating  objects  from  relational 
databases  through  views,  and  compare  their  costs  to  demonstrate  that  the  alternative 
methods  are  more  efficient  than  the  conventional  method  of  retrieving  a  single  flat 
relation.  Our  major  cost  measure  is  the  execution  time.  Required  main  memory  space 
is  another  important  measure  of  cost.  Main  memory  space  is  taken  into  consideration 
in  the  development  of  object  instantiation  methods.  However,  cost  comparison  is 
carried  out  using  time  as  the  only  cost  measure. 

Queries  are  restricted  to  acychc  select-project-conjunctive  join  queries,  in  the  same 
way  as  in  Chapter  3.  For  simpHcity,  we  consider  only  inner  joins  in  a  query  and  do  not 
consider  any  left  outer  joins  for  the  rest  of  this  chapter.  This  simpHfication  indeed 
simplifies  the  developed  algorithms  and  cost  comparison  tasks.  Nevertheless  these 
simplifications  does  not  prevent  us  from  demonstrating  that  the  alternative  methods 
are  more  efficient  than  the  conventional  method,  as  explained  now.  The  semantics  of 
a  left  outer  join  is  as  follows. 

Ri  R2  =  {Ri  R2')  U  ((.Ri  —  Ri  R2')  X  A)  (4- 1) 

where  A  is  a  null  tuple  of  R2,  i.e.,  a  tuple  consisting  of  nulls  for  each  column  of 
R2  and  X  denotes  a  semijoin.  Equation  4.1  says  that  tuples  produced  from  a  left 
outer  join  Rom  Ri  to  R2  is  equal  to  the  tuples  produced  from  an  inner  join  plus  the 
concatenation  of  the  tuples  of  Ri  which  do  not  have  matching  tuples  in  R2  and  a  null 
tuple  of  R2.  As  mentioned  in  Section  4.2.2,  inner  joins  are  the  source  of  duplicate 
subtuples  while  outer  joins  are  the  source  of  null  subtuples  contained  in  a  single  flat 
relation  query  result.  Therefore,  a  query  without  outer  joins  produces  only  duphcate 
subtuples  in  its  single  flat  relation  result  while  a  query  with  outer  joins  produces  nuU 
subtuples  as  well  as  the  duplicate  subtuples.  Therefore,  if  the  new  methods  that  will 
be  introduced  in  this  thesis  are  more  efficient  than  the  conventional  method  when  we 
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consider  inner  joins  only,  they  are  even  more  efficient  for  a  query  with  outer  joins  as 
well. 


4,2.4  Our  Approach 

First,  we  describe  three  different  object  instantiation  methods.  One  is  the  conven- 
tional  method  of  retrieving  a  query  result  in  the  form  of  a  single  flat  relation  and 
is  called  the  SFR  method.  The  second  method  retrieves  a  query  result  as  a  set  of 
relation  fragments,  and  is  called  the  RF  method.  Relation  fragments  are  materialized 
from  base  relations  by  reducing  them  with  selection,  projection,  and  join  operations. 
Relation  fragments  should  contain  all  information  required  for  restructuring  the  rela¬ 
tion  fragments  into  a  single  nested  relation.  The  third  method  retrieves  a  querj^^  result 
as  a  single  nested  relation  and  is  called  the  SNR  method.  A  single  nested  relation  is 
a  set  of  nested  tuples,  in  which  an  attribute  can  define  another  relation. 

Then,  we  develop  the  cost  models  of  the  three  different  object  instantiation  meth¬ 
ods  and  compare  their  costs.  In  the  client-server  architecture,  the  object  instantiation 
cost  is  the  sum  of  local  processing  cost  and  transmission  cost.  The  local  processing 
cost  is  the  total  execution  time  spent  on  a  server  and  a  client.  The  transmission  cost 
is  the  time  required  to  send  a  query  result  to  a  client  over  communication  network. 
Obviously,  the  transmission  cost  is  more  significant  in  the  WAN  environment  than  in 
the  LAN  environment.  Since  our  purpose  is  to  compare  the  costs  of  different  methods 
rather  than  to  estimate  the  costs  precisely,  the  cost  items  common  to  all  three  meth¬ 
ods  are  excluded  from  our  cost  model.  Besides,  we  make  necessary  simplifications  as 
long  as  the  simplifications  do  not  invalidate  the  cost  comparison  result. 


4.3  Development  of  Object  Instantiation  Meth¬ 
ods 

In  this  section,  we  first  give  an  overview  of  the  SFR,  RF,  and  SNR  method,  and  then 
give  a  detailed  description  of  each  step  of  the  three  methods.  Since  our  objective  is 
to  show  that  the  RF  method  and  the  SNR  method  are  more  efficient  than  the  SFR 
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method,  we  make  the  SFR  method  as  efficient  as  possible  to  avoid  any  bias  in  favor 
of  the  RF  method  or  the  SNR  method.  As  will  be  explained,  the  SNR  method  is 
essentially  the  same  as  the  RF  method  except  that  nesting  step  is  carried  out  by  a 
server.  Therefore,  we  first  focus  on  the  SFR  and  RF  method  in  Section  4.3.2  through 
Section  4.3.3,  and  then  discuss  the  SNR  method  separately  in  Section  4.3.4. 

4.3.1  Overview  of  the  Three  Object  Instantiation  Methods 

We  give  here  a  brief,  abstract  overview  of  the  object  instantiation  steps  of  three 
different  methods:  the  SFR  method,  RF  method,  and  SNR  method,  focusing  on  the 
distinction  among  the  methods.  First  we  describe  the  processes  of  each  step  of  the 
three  methods  and  show  an  example  of  a  single  flat  relation,  relation  fragments,  and 
a  single  nested  relation  to  help  readers  understand  the  distinction  among  them. 

4.3. 1.1  Overview  of  the  Processes 

Figure  4.3  illustrates  the  processes  of  the  three  object  instantiation  methods.  The 
overall  process  is  divided  into  three  phases:  materialization,  transmission,  and  trans¬ 
lation.  The  process  of  each  phase  is  different  for  each  method. 

SFR  method:  A  query  is  materialized  into  a  single  flat  relation  by  a  server,  trans¬ 
mitted  as  such,  and  is  translated  into  objects  by  a  client.  Translation  is  done  in  two 
steps:  nesting  and  reference  resolution.  In  the  nesting  step,  a  retrieved  single  flat 
relation  is  restructured  into  a  nested  relation  by  our  implementation  of  the  NEST 
[70]  operator.  The  reference  resolution  step  is  needed  to  resolve  references  among 
objects,  thus  configuring  the  retrieved  objects  into  a  network  of  references. 

RF  method:  A  query  is  materialized  into  a  set  of  relation  fragments  by  a  server, 
transmitted  as  such,  and  is  translated  into  objects  by  a  client.  As  in  the  SFR  method, 
translation  is  done  in  two  steps,  nesting  and  reference  resolution,  but  a  different  pro¬ 
cess  is  used  for  the  nesting  step  due  to  the  different  structure  of  retrieved  data.  Since 
a  client  receives  no  separate  information  for  linking  tuples  among  relation  fragments, 
the  first  thing  for  a  client  to  do  is  to  create  necessary  linkage  information.  In  our 
work,  it  is  done  by  creating  indexes  on  join  attributes.  Once  indexes  are  created,  joins 
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(b)  Client 

(  RF  Nesting:  Join  purge  Assembly  plaiuiing  Index  creation  Navigational  join) 


Figure  4.3:  Overall  processes  of  object  instantiation 
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are  performed  starting  from  each  tuple  of  the  pivot  relation  fragment  and  navigating 
along  the  joins  to  linked  relation  fragments.  The  result  of  the  navigational  join  is 
a  single  nested  relation,  the  same  one  that  would  be  produced  by  the  nesting  step 
of  the  SFR  method.  The  reference  resolution  step  is  the  same  as  that  of  the  SFR 
method  because  it  works  on  the  same  nested  relation. 

SNR  method:  A  query  is  materialized  into  a  single  nested  relation,  transmitted  as 
such,  and  is  translated  into  objects  by  a  client.  We  had  once  looked  into  materializing 
a  query  directly  into  a  single  nested  relation.  However,  this  direct  approach  inhibits 
join  ordering  by  a  query  optimizer  because  the  order  of  nested  subrelations  in  a  single 
nested  relation  is  not  necessary  the  same  as  the  join  order  chosen  by  a  query  optimizer. 
In  other  words,  reordering  of  joins  for  more  efficient  processing  of  the  query  can  not 
be  attempted.  Hence,  we  decided  to  take  an  indirect  approach  in  which  a  server 
first  materializes  a  query  result  using  the  SFR  or  RF  method  and  then  nest  the  query 
result  into  a  single  nested  relation. 

A  chent  does  not  have  to  do  the  nesting  step  of  translation  but  does  only  the 
reference  resolution  step.  The  reference  resolution  step  is  the  same  as  that  of  the 
SFR  method  and  RF  method.  Therefore,  the  SNR  method  uses  the  same  process  as 
the  SFR  or  RF  method  except  that  the  nesting  step  of  translation  has  been  moved  to 
a  server.  Telling  in  advance,  the  result  of  our  work  showed  that  the  nesting  of  relation 
fragments  is  cheaper  than  the  nesting  of  a  single  fiat  relation.  Figure  4.3  shows  the 
nesting  step  using  relation  fragments. 

4.3.1.2  Examples  of  a  SFR,  RF,  and  SNR 

Let  us  consider  an  example  database  containing  the  three  relations  shown  in  Fig¬ 
ure  4.4a.  Figure  4.5  shows  the  examples  of  a  single  flat  relation,  relation  fragments, 
and  a  single  nested  relation,  which  would  be  materialized  from  the  same  query  shown 
in  Figure  4.4b.  Each  of  the  relation  fragment  is  materialized  from  a  corresponding 
base  relation.  Column  values  labeled  with  an  asterisk  (*)  denote  redundant  column 
values  for  each  method. 

We  see  that  the  single  fiat  relation  contains  duplicate  subtuples.  For  example. 
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a.  Relations: 

DEPT(dno,  dname,  mgr,  sec,  loc,  parentdiv) 

EMP(eno,  ename,  salary,  dept,  esex,  degree,  ebdate,  addr) 
CHILD(parent,  ename,  ebdate,  esex,  school) 

b.  Query: 


TT  j  j  jj  i,  iifDEPT  X  EMP  X  CHILD) 

^^{dno,dname,ename,addr,cname,school}''  dbo=dept  eno=parent  ' 


Figure  4.4:  Example  relations  and  query 


three  of  the  four  “Sales”  department  names  in  the  dname  column  are  redundant  dupli¬ 
cates.  On  the  other  hand,  the  relation  fragments  contain  no  such  duplicate  subtuple. 
However,  we  note  that  the  eno,  dept  attributes  of  the  relation  fragment  from  the  base 
relation  EHP,  and  the  parent  attribute  of  the  relation  fragment  from  the  base  relation 
CHILD  have  never  been  specified  in  the  projection  set  of  the  query.  Nevertheless  these 
attributes  must  be  materialized  to  make  the  linkage  among  the  tuples  of  the  three 
relation  fragments  possible  on  a  chent.  In  other  words,  *extra’  attributes  are  materi¬ 
alized  in  addition  to  the  projection  set  in  the  query  and  are  required  to  perform  ‘joins’ 
among  the  relation  fragments  in  the  nesting  step.  Hence  we  call  those  attributes  as 
extra  join  attributes.  As  for  the  single  nested  relation,  obviously  it  contains  less  num¬ 
ber  of  redundant  subtuples  than  the  single  flat  relation,  but  the  example  shows  that 
it  still  contains  some  redundant  subtuples.  In  the  example,  Steve  works  for  both  the 
Sales  department  and  the  Purchase  department  and  therefore,  the  name  Steve,  his 
address,  and  his  child’s  name  and  child’s  school  appear  twice  in  two  different  nested 
tuples.  We  call  this  source  of  redundant  data  contained  in  a  single  nested  relation  as 
multiple  occurrences  of  subtuples. 

In  the  rest  of  this  chapter,  we  shall  adopt  the  following  notations.  We  denote  a 
single  flat  relation  as  T  (meaning  a  ‘Table’),  a  relation  fragment  as  Fi  (meaning  a 
‘Fragment’),  and  a  nested  subrelation  within  a  single  nested  relation  as  Si  (meaning 
a  ‘Subrelation’). 
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dno 

dname 

ename 

addr 

ename 

school 

16 

Sales 

Steve 

701A  Welch  Rd.,  Palo  Alto 

Tom 

Bing  Nursery  School 

16* 

Sales* 

Steve* 

701A  Welch  Rd.,  Palo  Alto* 

Mike 

Escondido  Elementary  School 

16* 

Sales* 

Ronald 

370  Hillside  Drive,  Redwood  City 

Jennifer 

Stanford  University 

16* 

Sales* 

Ronald* 

370  Hillside  Drive,  Redwood  City* 

Irene 

McDonald  High  School 

21 

Purchase 

Steve* 

701A  Welch  Rd.,  Palo  Alto* 

Tom* 

Bing  Nursery  School* 

21* 

Purchase* 

Steve* 

701A  Welch  Rd.,  Palo  Alto* 

Mike* 

Escondido  Elementary  School* 

21* 

Purchase* 

Andy 

1090  Psyche  Dr.,  Los  Altos  Hills 

Kirk 

U.C.  Berkeley 

(a)  Single  flat  relation  (SFR):  Duplicate  subtuples  (*) 


dno 

dname 

16 

Sales 

from  DEPT 

21 

Purchase 

eno 

ename 

addr 

dept 

125* 

Steve 

701 A  Welch  Rd.,  Palo  Alto 

16* 

124* 

Ronald 

370  Hillside  Drive,  Redwood  City 

16* 

125* 

Steve 

701A  Welch  Rd.,  Palo  Alto 

21* 

126* 

Andy 

1090  Psyche  Dr.,  Los  Altos  Hills 

21* 

parent 

ename 

school 

124* 

Jennifer 

Stanford  University 

124* 

Irene 

McDonald  High  School 

125* 

Tom 

Bing  Nursery  School 

125* 

Mike 

Escondido  Elementary  School 

126* 

Kirk 

U.C.  Berkeley 

from  CHILD 


(b)  Relation  fragments  (RF’s):  Extra  join  attributes  (*) 


dname  ename 

addr 

ename 

school 

: 

16 

Sales 

Steve 

701A  Welch  Rd.,  Palo  Alto 

Tom 

Bing  Nursery  School 

Mike 

Escondido  Elementary  School 

[Ronald  370  Hillside  Drive,  Redwood  City 

Jennifer 

Stanford  University 

Irene 

McDonald  High  School 

_ 

21 

Steve* 

701A  Welch  Rd.,  Palo  Alto* 

Tom* 

Bing  Nursery  School* 

Mike*  1 

Escondido  Elementary  School* 

_ 

Andy  1090  Psyche  Dr.,  Los  Altos  Hills 

Kirk 

U.C.  Berkeley 

(c)  Single  nested  relation  (SNR):  Multiple  occurrence  of  subtuples  (*)  <  duplicate  subtuples 

Figure  4.5:  Exeimples  of  a  SFR,  RF,  and  SNR 
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4.3.2  Materialization  in  the  SFR  Method  and  RF  Method 

In  the  materialization  phase  of  the  SFR  method  and  the  RF  method,  a  query  is 
materialized  into  a  single  flat  relation  or  a  set  of  relation  fragments  depending  on 
the  method.  The  materialization  phase  consists  of  two  steps:  query  processing  and 
duplicate  elimination.  In  main  memory  databases,  the  choice  of  query  processing 
strategies  [86,  87,  77,  79,  80,  82,  85]  is  based  on  the  criteria  of  the  number  of  CPU 
cycles  and  memory  space  efficiency  rather  than  the  number  of  disk  accesses  and 
disk  space  efficiency.  The  results  of  comparing  different  query  processing  strategies 
obtained  by  some  researchers  [86,  87,  77]  showed  that  hash-based  query  processing 
strategies  are  faster  than  others  when  large  main  memory  is  available.  On  the  other 
hand,  a  main  memory  database  system  used  in  OBE  [75,  85,  80]  implemented  a 
pipelined  nested  loop  join  [85,  80]  with  array  indexes  and  obtained  good  performance 
in  both  time  and  memory  space.  One  advantage  of  using  this  join  algorithm  is  that 
it  does  not  create  intermediate  relations  during  query  processing. 

4.3. 2.1  Query  Processing  for  a  Single  Flat  Relation  (SFR) 

Whichever  join  algorithm  may  be  used  for  query  processing,  a  join  between  two 
relations,  Ri  X  R2,  produces  the  following  set  of  tuples. 

{(t;t;}|ti  €  Ri,t2  €  R2,h.Vi0t2.V2,t[  =  U  Cl), 4  =  <2.(7r2  U  C2)}  (4.2) 

where  tt^,  i  =  1, 2,  denotes  the  set  of  attributes  of  Ri  that  are  specified  in  the  projection 
set  of  the  query,  and  Ci,*  =  1,2,  denotes  the  set  of  attributes  that  are  needed  for 
subsequent  join  computations.  Note  that  join  attributes  {r}i)  are  discarded  unless 
they  are  elements  of  ir,-  U  C»- 

If  we  use  the  pipelined  nested  loop  join  strategy  which  showed  successful  perfor¬ 
mance  in  OBE,  the  join  processing  algorithm  becomes  as  follows. 

Algorithm  4.3.1  (SFR  Query  processing) 

Input:  base  relations  Ri,i  —  1, 2,  •  •  • , n,  and  a  query 
Output:  a  set  of  composite  tuples  of  the  query  result. 
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Procedure: 

Let  denote  a  conjunction  of  join  predicates  between  Ri  and  J?i,  •  •  • ,  respec¬ 
tively.  Each  Ri  is  assumed  to  have  already  been  filtered  by  applicable  selection 
conditions. 

For  each  tj  €  Ri 

For  each  <2  €  R2  satisfying  $2 

For  each  tn  E  Rn  satisfying  $„ 

Output  ti.TTi  II  t2.Tr2  II  II  tn-T^n-  /*  ||  denotes  ‘concatenation’.  */ 


4.3. 2.2  Query  Processing  for  Relation  Fragments  (RF) 

We  focus  on  how  Algorithm  4.3.1  should  be  modified  to  materialize  a  set  of  relation 
fragments  instead  of  a  single  flat  relation,  rather  than  inventing  a  different  algorithm. 

First  of  all,  the  single  Output  statement  in  Algorithm  4.3.1  must  be  decomposed 
into  multiple  Output  statements,  i.e.,  one  Output  for  each  relation  fragment.  Sec- 
ondly,  join  attributes  (J7i’s)  shoidd  be  materialized  in  addition  to  Xj  U  (i  so  that  a 
cbent  can  build  indexes  on  the  join  attributes.  Thus,  a  join  between  two  relations, 
Ri  Xi  R2,  should  produce  the  following  set  of  tuples. 

{(^'1*2)  1*1  ^  Ruh  €  R2,ti.riiet2.V2,t[  =  ti.(xi  U  Cl  U7?i),t'  =  t2.(7r2  U  C2  U7?2)}  (4.3) 

where  7ji  denotes  the  set  of  join  attributes.  Accordingly,  the  Output  statement  of 
Algorithm  4.3.1  is  modified  to  Output  ii.(7riU7?i);  t2.(7r2U7?2);  •  •  * ;  in.(7rnU7/n).  Thirdly, 
a  tuple  from  an  outer  nested  loop  need  not  be  emitted  unless  it  is  a  new  tuple.  For 
example,  ti  £  Ri  in  the  outermost  loop  need  be  emitted  only  once  for  each  completion 
of  all  of  its  inner  loops.  We  can  use  switches  which  denote  whether  a  new  tuple  has 
been  obtained  from  the  outer  loop,  to  avoid  these  unnecessary  emissions. 

The  following  algorithm  shows  a  pipelined  nested  loop  join  algorithm  modified 
from  Algorithm  4.3.1  to  the  above  discussion,  i.e.,  using  multiple  output  statements, 
emitting  necessary  join  attributes,  and  using  switches  (SW’s)  to  avoid  unnecessary 
emission  of  tuples. 
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tl  t2  t3  t4  tl  t2  t3  t4 


(a)  SFR  query  processing  (b)  RF  query  processing 


Figure  4.6:  Tuples  emitted  from  base  relations 

Algorithm  4.3.2  (RF  Query  processing) 

Input:  base  relations  Ri,i  =  1,2,  -  ••  ,n,  and  a  query 
Output:  a  set  of  relation  fragments  Fj,  t  =  1, 2,  •  ■  ■ , n. 

Procedure: 

Let  denote  a  conjunction  of  join  predicates  between  Ri  and  i2i,  •  •  • ,  Ri-i,  respec¬ 
tively.  Bach  Hi  is  assumed  to  have  already  been  filtered  by  applicable  selection 
conditions. 

For  each  ti  G  i2i, 

Set  SWi. 

For  each  fa  £  Rs  satisfying  #2? 

SetSWj. 

For  each  f„  €  iJ„  satisfying 
Set  SW„. 

For  each  SWj,  i  =  1, 2,  •  •  • ,  n, 

If  SWj  is  set  then  begin 
Output  U  i7i). 

Reset  SWj. 

end 


Note  that  some  of  the  attributes  emitted  for  a  relation  fragment  are  extra  join 


56 


CHAPTER  4.  EFFICIENTLY  INSTANTIATING  OBJECTS 


attributes,  that  is,  not  specified  in  the  projection  set  of  the  query  but  are  still  needed 
to  build  the  Hnkage  among  the  relation  fragments  in  the  translation  phase.  Here 
comes  a  formal  definition  of  extra  join  attributes. 

Definition  4.3.1  (Extra  join  attributes)  Given  the  set  of  attributes  F'i  of  a  rela¬ 
tion  fragment  Fi  and  a  projection  set  V  specified  in  the  query,  —  V  is  the  set  of 
extra  join  attributes  of  F,-. 

4.3. 2.3  Tuples  Emitted  from  Query  Processing 

We  see,  by  comparing  Algorithm  4.3.1  and  Algorithm  4.3.2,  that  there  is  little  dif¬ 
ference  in  the  cost  of  the  query  processing  itself.  The  query  processings  of  both  the 
SFR  method  and  the  RF  method  execute  the  same  nested  loops.  That  is,  if  the  car¬ 
dinalities  of  the  relations  Ri,  i  =  1, 2,  •  •  • ,  n/  are  Ni,  i  =  1, 2,  •  •  • ,  both  algorithms 
take  0{NiN2  •  “  Nn^)  time.  However,  the  numbers  of  tuples  that  are  actually  emit¬ 
ted  from  each  base  relation  by  the  output  statement  are  different  in  each  algorithm. 
Figure  4.6  illustrates  this  difference.  In  Algorithm  4.3.1,  the  tuples  from  each  base 
relation  that  satisfy  all  join  conditions  are  emitted  as  a  composite  tuple  once  for  every 
innermost  loop  execution.  Therefore,  N1N2  •  •  •  Nn^  composite  tuples  are  emitted  for 
the  entire  loops.  On  the  other  hand,  in  Algorithm  4.3.2,  a  tuple  from  a  base  relation 
is  eimtted  only  if  the  execution  of  its  inner  loops  has  been  completed.  Therefore, 
Algorithm  4.3.2  never  emits  more  tuples  than  Algorithm  4.3.1. 

4.3. 2. 4  Duplicate  Elimination 

The  query  result  has  duplicates  if  there  are  duplicate  tuples  in  the  base  relations 
specified  in  the  query.  Besides,  projections  performed  in  a  query  processing  can 
produce  duplicate  tuples  in  the  query  result.  These  duplicate  tuples  result  in  duplicate 
objects  when  they  are  translated  into  objects.  We  disallow  any  duplicate  objects  to 
be  instantiated  from  a  database  because  duphcate  objects  Eire  regarded  as  separate 
objects  in  an  object-oriented  model.  Therefore,  duplicate  tuples  are  removed  from  the 
final  query  result.  Duplicate  elimination  can  be  done  either  using  sorting  or  hashing. 
Given  a  relation  of  N  tuples,  a  sorting  costs  0{N  log2  N)  and  a  hashing  costs  0{NIB) 
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Figure  4.7:  The  structure  of  a  chained  bucket  hashing  for  duplicate  elimination 

where  B  is  the  size  of  a  bucket  header.  It  is  anticipated  that  hashing  be  faster  than 
sorting.  Another  advantage  of  hashing  is  that  hashing  can  be  pipelined  between 
query  processing  and  transmission  because  hashing  is  processed  tuple  by  tuple.  That 
is,  each  tuple  of  the  final  query  result  can  be  hashed  immediately  as  soon  as  it  is 
available  and  a  tuple  which  has  no  duplicate  in  the  already  hashed  set  of  tuples  can 
be  transmitted  immediately.  On  the  other  hand,  if  sorting  is  used,  all  tuples  of  the 
final  query  result  must  be  collected  before  sorting  can  start. 

We  use  a  simple  chained  bucket  hashing  [94]  (alias,  an  open  hashing  [95]),  whose 
structure  is  shown  in  Figure  4.7.  The  bucket  header  is  an  array  of  pointers  to  chains 
of  buckets.  Each  bucket  in  a  chain  is  a  record  of  two  entries  -  a  tuple  and  a  pointer 
to  the  next  bucket.  Given  this  structure,  the  algorithm  for  eliminating  dupHcates  in 
pipelining  with  transmission  becomes  as  follows. 

Algorithm  4.3.3  (Duplicate  elimination) 

1.  Allocate  a  hashing  bucket  header. 

2.  For  each  tuple  to  output  from  the  query  processor, 

(a)  Compute  a  hashed  address  h{to)  using  the  entire  tuple  as  the  input  where 
h  is  a  hashing  function. 
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(b)  i.  Traverse  the  chain  of  buckets  starting  from  the  header  located  at  the 

address  h{to). 

For  each  bucket  visited, 

A.  Compare  the  tuple  to  with  the  tuple  tb  contained  in  the  bucket. 

B.  If  to  =  tb  then  go  to  Step  2c. 

ii.  /*  No  same  tuple  as  to  was  found  in  the  chain.  */ 

Insert  a  new  bucket  containing  to  into  the  chain  and  transmit  tg. 

(c)  Continue. 

In  Algorithm  4.3.3,  actual  tuples  are  stored  in  hashing  buckets.  We  can  reduce  the 
memory  space  allocated  for  hashing  buckets  if  we  store  pointers  to  the  tuples  of  base 
relations  instead  of  the  tuples  emitted  from  a  query  processor.  This  method  certainly 
reduces  the  memory  space  allocated  for  buckets,  but  requires  the  reading  of  base 
tuples  and  projection  on  them  every  time  a  query  result  tuple  is  to  be  compared  with 
the  bucket  entry.  We  will  assume  the  storage  of  actual  tuples  for  our  cost  modeling. 

4.3.3  Translation  in  the  SFR  Method  and  RF  Method 

In  the  translation  phase,  a  received  query  result  is  restructured  into  objects  that  can 
be  used  by  the  apphcation.  The  translation  process  depends  on  the  structure  of  the 
objects  defined  by  the  object  model.  In  our  work,  objects  realize  the  aggregation  hi¬ 
erarchy  [88,  89]  through  nested  structure  and  references  among  objects.  It  motivated 
us  to  design  the  complete  process  of  translation  in  two  steps,  nesting  and  reference 
resolution,  as  mentioned  in  Section  4.3.1.  In  the  nesting  step,  we  restructure  the 
retrieved  relation  or  relation  fragments  into  a  nested  relation.  In  the  reference  reso¬ 
lution  step,  references  among  objects  are  resolved  by  making  pointer  Hnkages  among 
the  nested  relations. 

Figure  4.8  and  Figure  4.9  illustrate  the  nesting  processes  of  the  SFR  method 
and  the  RF  method,  respectively.  The  nesting  step  is  carried  out  differently  for  the 
SFR  method  and  the  RF  method.  In  the  SFR  method,  it  is  done  by  decomposing 
received  tuples  into  subtuples  corresponding  to  different  nested  subrelations,  and 
assembling  the  decomposed  subtuples  into  nested  tuples.  On  the  other  hand  in  the 
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Figure  4.8:  SFR  nesting  process 


RF  method,  it  is  done  by  creating  indexes  on  the  join  attributes  of  the  relation 
fragments  and  performing  navigational  join.  Navigation  starts  from  the  pivot  relation 
fragment  and  follows  the  joins  of  the  query  to  find  matching  tuples  in  the  joined 
relation  fragments.  At  least  one  matching  tuple  always  exists  in  each  relation  fragment 
because  the  relation  fragments  have  already  been  fully  reduced  by  the  same  join 
operations  on  a  server.  The  matching  tuples  thus  found  are  assembled  into  nested 
tuples  according  to  an  assembly  plan  generated  by  comparing  the  join  tree  and  the 
nesting  format  tree.  The  join  purge  step  chooses  only  one  of  the  conjunctive  join 
predicates  from  each  join  in  the  join  tree. 

The  reference  resolution  step  is  out  of  our  scope  because  its  process  is  specific  to 
the  object  schema  defined  by  the  application.  Besides,  omitting  this  step  does  not 
affect  the  cost  comparison  result  because  the  reference  resolution  processes  of  the  RF 
method  and  SFR  method  are  identical. 
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Join  tree  Purged  join  tree 


Single  nested  relation 


Figure  4.9:  RF  nesting  process 
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4.3.3. 1  Generation  of  a  Nesting  Format 

A  nesting  format  [72]  is  a  string  interpreted  as  the  schema  of  a  nested  relation.  It 
provides  the  information  necessary  to  restructure  a  flat  relation  or  a  set  of  relation 
fragments  into  a  nested  relation.  We  denote  the  nesting  of  subrelations  with  paren¬ 
theses  such  as  A(BC)(D(E))^. 

We  can  build  an  0-tree,  given  the  type  definition  of  an  object  0.  An  example 
is  shown  in  Figure  4.10c  for  the  object  type  shown  in  Figure  4.10a.  Given  this  0- 
tree  and  the  attribute  mapping  function  whose  mapping  is  shown  in  Figure  4.10c, 
the  nesting  format  for  restructuring  the  result  of  the  query  shown  in  Figure  4.10b  is 
generated  in  the  following  procedure.  (Object  type,  0-tree,  and  attribute  mapping 
function  have  been  defined  in  Part  I.) 

Algorithm  4.3.4  (Generation  of  a  nesting  format) 

Input:  0-tree,  and  attribute  mapping  function  (AMF) 

Output:  a  nesting  format 
Procedure: 

1.  Starting  from  the  root  of  the  0-tree,  recursively  replace  each  node  by  the  list 
of  its  children. 

2.  Replace  each  object  attribute  in  the  list  produced  by  Step  1  by  the  correspond¬ 
ing  relation  attribute  mapped  by  the  AMF. 

3.  Strip  off  the  outermost  parentheses  from  the  list  produced  in  Step  2. 

Figure  4.10c  shows  an  example  of  the  mapping  between  object  attributes  and 
relation  attributes  appearing  in  a  query.  Let  us  assume  the  schema  of  the  retrieved 
relation  is  KADEHGIJ.  Step  1  of  Algorithm  4.3.4  generates  (oidAo(DoEo(HoGo))(IoJo)), 
which  is  replaced  by  (KA(DE(HG))(IJ))  in  Step  2  and  becomes  KA(DE(HG))IJ  in  Step  3. 
This  format  becomes  the  schema  of  the  nested  relation.  For  example,  HG  is  mapped 
to  HoGo  which  is  nested  within  a  complex  attribute  Fo-  Fo  is  nested  within  a  complex 
attribute  Bo,  which  is  an  attribute  of  O.  These  two  levels  of  nesting  is  reflected  by 
the  two  levels  of  parentheses  in  the  generated  nesting  format. 


^In  [72],  the  same  format  is  denoted  as  A(BC)  *  (D(E)*)*. 
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a.  Object  type:  (‘simple’  denotes  a  literal  such  as  an  integer,  string,  etc.,  or  an 
object  id  (oid).  Its  value  can  be  an  atomic  literal  or  a  set  of  atomic  literals.) 

Type  0 
[Ao  :  simple, 

B„:  [Do',  simple,  Eo'-  simple, 

Fo'.  [Ho',  simple.  Go'-  simple]  ], 

Co'-  {I o'-  simple,  Jo-  simple]] 


b.  Query: 

•  Select-project-join  expression: 


nKADEHGlj(<^iKAD'M  IX  C72DG'EN  tX  (JaHGP  tX  C74A'IL'Q  CX  (J5LJS) 

D'«D  G'tfG  k0i'  l'SL 

where  6  e  {=,  <,  <,  >,  >}. 

•  Join  graph  of  the  query: 


c.  0-tree,  and  mapping  of  its  attributes  to  relation  attributes:  (0  is  the  object 
type,  oid  denotes  an  object  id,  and  the  other  capital  letters  subscripted  with  o 
denote  object  attributes.) 


O-tree 


attribute  mapping 


relation  attributes 
(projection  set) 


Figure  4.10:  An  example  of  object  type,  view,  and  O-tree 
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a.  Flat  relation  schema:  KADEHGIJ 

b.  Nesting  format:  KA(DE(HG))(IJ) 

c.  Nesting  format  tree: 


Figure  4.11:  An  example  of  a  nesting  format  and  its  nesting  format  tree 

We  can  draw  out  the  hierarchy  of  nested  subrelations  from  a  nesting  format.  An 
example  is  shown  in  Figure  4.11.  The  root  of  the  tree  represents  a  subrelation  which 
is  not  nested  within  any  other  subrelation,  and  its  descendents  represents  subrelations 
nested  within  their  parents.  We  call  such  a  tree  as  a  nesting  format  tree.  In  particular, 
the  subrelation  represented  by  the  root  is  called  a  pivot  subrelation  because  the  root 
always  contains  an  attribute  which  is  mapped  to  an  oid. 

4.3.3. 2  The  Structure  of  a  Single  Nested  Relation 

For  both  the  SFR  method  and  the  RF  method,  searching  is  required  every  time  a 
tuple  is  to  be  inserted  into  an  output  single  nested  relation.  The  tuple  is  inserted  only 
if  there  does  not  exist  the  same  tuple  in  the  single  nested  relation.  Hence,  the  number 
of  searchings  performed  is  always  greater  than  the  number  of  insertions  performed.  In 
particular,  a  large  portion  of  tuples  that  are  attempted  for  insertion  are  discarded  for 
the  SFR  method  if  the  number  of  duplicate  subtuples  in  a  single  flat  relation  is  large. 
Considering  these  facts,  the  structure  of  a  single  nested  relation  was  determined  to 
show  good  searching  performance. 
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Figure  4.12:  The  structure  of  a  single  nested  relation 
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Figure  4.12  shows  the  structure  of  a  single  nested  relation  used  for  our  work. 
Each  nested  subrelation  is  implemented  as  a  binary  search  tree.  The  root  SNR  node 
contains  only  the  pointer  to  the  root  of  the  binary  search  tree  representing  the  pivot 
subrelation.  Each  node  of  a  binary  search  tree  contains  a  tuple  value,  pointers  to  the 
roots  of  the  nested  binary  search  trees,  and  pointers  to  its  left  child  and  right  child. 
Both  searching  and  insertion  of  a  tuple  within  a  nested  subrelation  take  O(log2  N) 
time  where  N  is  the  number  of  tuples  currently  inserted  in  a  binary  search  tree. 

4.3.3.3  Nesting  of  a  Single  Flat  Relation 

In  [70],  NEST  was  introduced  as  an  operator  for  restructuring  a  flat  relation  into  a 
nested  relation.  Similar  concepts  were  also  described  in  [71,  72].  Our  nesting  process 
described  here  is  an  instance  of  implementing  the  NEST  operator. 

Figure  4.13  shows  an  example  of  the  relation  instance  before  and  after  the  nesting 
step.  The  single  flat  relation  of  Figure  4.13a  was  obtained  by  evaluating  the  query 
in  Figure  4.10b  on  a  set  of  relation  instances.  In  Figure  4.13b,  IJ  is  independent  of 
DEHG,  but  is  dependent  only  on  KA.  SFR  nesting  can  be  performed  pipelined  with 
the  reception  of  the  tuples  from  a  server.  Each  received  tuple  is  decomposed  into 
subtuples  where  each  subtuple  is  an  instance  of  each  node  of  the  nesting  format  tree. 
Each  subtuple  is  then  inserted  into  an  output  single  nested  relation.  Since  there  may 
exist  duplicate  subtuples  in  a  single  flat  relation,  it  must  be  checked  before  insertion  if 
there  already  exists  the  same  subtuple  in  the  single  nested  relation.  Figure  4.12  shows 
the  result  of  inserting  the  first  three  tuples  of  the  single  flat  relation  of  Figure  4.13a 
into  an  empty  single  nested  relation.  Figure  4.13b  shows  the  nested  tuples  of  the  final 
single  nested  relation  schematically. 

An  algorithm  for  the  SFR  nesting  step  is  as  follows. 

Algorithm  4.3.5  (SFR  Nesting) 

Input:  received  tuples  of  a  single  flat  relation,  and  a  nesting  format  tree  (NFT). 
Output:  A  single  nested  relation  (SNR). 

Procedure: 

1.  Allocate  an  empty  (root  only)  single  nested  relation  SNR. 
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K  A  D  E  H  G  I  J 

k2  02  ^4  62  h,2  p3  ti  ji 

ki  Cl  d2  62  ^4  92  *2  j4 

^3  0,3  ds  64  hi  gi  Z2 

k2  02  ds  64  hs  Pi  ii  jj 

ki  ai  ^2  ^2  ^4  92  ^2  32 

ks  03  ds  64  hs  Pi  12  j2 

All  Oi  d2  62  A.2  Ps  22  34 

^3  O3  ds  64  Pi  22  J4 

/tl  Cl  d2  62  /22  9z  *2  32 

ks  as  (is  64  /15  22  74 

^2  as  cis  C4  9i  ^1  ii 

(a)  Retrieved  single  flat  relation  (before  nesting  starts) 


a:  A  {D  E  {H  G))  (/  J) 

^2  0-2 

^^4  ^2 

^2  92 

ds  64 

hi  Pi 
^5  Pi 

ii  ji 

Ai  tti 

d2  62 

^2  P3 
^4  S2 

*2  i2 

*2  j4 

^3  tts 

ds  64 

hi  Pi 

Pi 

*2  i2 

^2  >4 

(b)  Single  nested  relation  (after  nesting  completes) 


(c)  Nesting  format  tree 

Figure  4.13:  An  example  of  nesting  a  single  flat  relation 
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2.  Wp  :=  the  root  of  the  empty  SNR. 

3.  Up  :=  the  root  of  NFT. 

4.  For  each  tuple  U  received  from  a  server, 

Assemble('u;p,Up,n„j,tr).  /*  Project  U  on  Up.  */ 

In  Algorithm  4.3.5,  Assemble(tt;i,ttj,  t,)  inserts  a  tuple  tj  into  a  binary  search  tree 
whose  root  is  the  node  pointed  by  Wi.Ui  -  the  Ui  field  of  an  insertion  entry  node  u;,. 
This  binary  search  tree  belongs  to  a  nested  subrelation  Si  corresponding  to  the  node 
Ui  of  the  nesting  format  tree. 

Algorithm  4.3.6  (Assemble_SFR) 

Input:  a  node  (iwi)  of  SNR,  a  node  (ui)  of  NFT,  and  a  tuple  U  to  be  inserted. 
Output:  SNR  with  U  inserted  if  U  is  new. 

Procedure: 

1.  Wr  :=  the  node  pointed  by  Wi.Ui. 

/*  Wr  is  the  root  of  a  binary  search  tree  to  be  searched.  */ 

2.  If  K  :=  Search(u;„ti))  =  NOTJOUND 

then  Insert-tuples(u;i ,  Ui,  U) 
else  /*  There  exists  U  already.  */ 

(a)  ^  :=  the  set  of  Ui’s  children  {uc)  in  NFT. 

(b)  If  ’i'  =  {  }  then  Return 
else 

For  each  Uc  €  ’4^, 

Assemble(it;c,«c)n„^tr).  /*  Project  t,  on  Uc.  */ 

In  Algorithm  4.3.6,  Search(«;r,  U)  finds  a  SNR  node  whose  tuple  value  =  U  among 
the  nodes  of  the  binary  search  tree  rooted  by  Wr,  and  returns  NOTJOUND  if  no 
tuple  ti  is  found  or  returns  the  SNR  node  containing  the  tuple  U  if  one  is  found. 

Algorithm  4.3.7  (Search) 

Input:  A  node  (lUj)  of  a  binary  search  tree  and  a  tuple  ti  to  be  searched  for. 
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Output:  Return  NOT_FOUND  or  a  found  node. 

Procedure: 

If  Wi  =  nil  then  return  NOT_FOUND 
else  if  Wi-tuple  =  t,-  then  return  Wi 

else  if  (wi. tuple  <  U)  then  return(Search(u;i.LChild,  ti)) 
else  return(Search(u;i.RChild,  U)). 

Insert-tuples(wj, Ui, ti)  inserts  tuple  ti  into  the  binary  search  tree  whose  root  is 
the  node  pointed  by  Wi.Ui,  and  also  inserts  all  of  t,’s  nested  subtuples  corresponding 
to  u,’s  descendents  obtained  from  the  nesting  format  tree. 

Algorithm  4.3.8  (Insert-tuples) 

Input:  A  node  (wi)  of  SNR,  a  node  (ui)  of  NFT,  and  a  tuple  ti  to  be  inserted. 
Output:  ti  is  inserted  into  the  nested  subrelation  whose  root  is  pointed  by  Wi.Ui,  and 
aU  of  t,-’s  nested  tuples  are  inserted  into  Ui’s  nested  subrelations. 

Procedure: 

1.  Wf  ;=  Insert(u;i,Ui,t,).  /*  Insert  the  tuple  U.  */ 

2.  /*  Insert  ti’s  nested  subtuples.  */ 

:=  {t6c|uc  is  a  child  of  Ui  in  the  NFT.} 

If  $  =  {  }  then  Return 
else 

For  each  Uc  € 

Insert-tuples(t[;e,  Uc,  U-Uc). 

Insert(u;i,  Ui,  ti)  inserts  t,-  into  the  binary  search  tree  whose  root  is  pointed  by  Wi-Ui 
and  returns  the  inserted  node. 

Algorithm  4.3.9  (Insert) 

Input:  A  node  (lUi)  of  SNR,  a  node  (u^)  of  NFT,  and  a  tuple  ti  to  be  inserted. 
Output:  Returns  the  inserted  node. 

Procedure: 
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Allocate  an  empty  node  w*. 

Copy  U  to  tUe-tuple. 
w  ;=  the  node  pointed  by  Wi.Ui 

/*  w  is  the  root  of  the  binary  search  tree  into  which  Wi  is  to  be  inserted.  */ 

If  k;  =  nil  /*  An  empty  nested  subrelation.  */ 

then  Wi.Ui  :=  the  address  of  Wg.  /*  Insert  w^.  */ 
else  while  We  is  not  inserted  begin 
if  ti  <  tn. tuple  then 

if  (tn  :=  the  node  pointed  by  w.lc)  =  nil 
then  insert  We  as  the  left  child  of  w 

else 

if  {w  :=  the  node  pointed  by  tn.rc)  =  nil 
then  insert  as  the  right  child  of  w 

end 

Return  Wg. 

4.3.3. 4  Index  Structure  for  Relation  Fragments:  Chained  Bucket  Hashing 

Since  no  linkage  information  among  the  relation  fragments  is  retrieved  from  a  server 
in  the  RF  method,  a  chent  has  to  build  necessary  linkage  information  using  the 
received  relation  fragments.  Our  method  is  to  create  indexes  on  the  join  attributes 
of  the  relation  fragments.  In  the  main-memory  resident  environment,  the  choice  of 
an  appropriate  index  structure  is  based  on  the  criteria  of  the  number  of  CPU  cycles 
and  memory  space  efficiency.  In  [76,  77],  Lehman  et  al.  showed  the  performance 
comparison  of  different  index  structures  of  the  following  category. 

•  Order-preserving  indexes:  Array  [74],  AVL  Tree  [90],  B  Tree  [91],  T  Tree  [76,  77]. 

•  Randomizing  (hashing)  indexes:  Linear  Hashing  [93],  Modified  Linear  Hashing 
[76,  77],  Extended  Hashing  [92],  Chained  Bucket  Hashing. 

The  performance  was  compared  for  the  index  insertion  (or  equivalently,  creation), 
random  search,  a  query  of  mixed  operations,  range  query,  scan,  deletion,  and  for  the 
index  memory  utilization  -  the  ratio  between  the  memory  allocated  and  the  memory 
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Figure  4.14:  The  structure  of  a  chained  bucket  hashing  index 


actually  containing  data.  According  to  their  result,  the  Chained  Bucket  Hashing 
shows  the  best  performance  in  aU  the  above  operations  except  the  range  quer}',  for 
which  any  kind  of  hashing  index  is  inappropriate.  Since  we  need  only  creation  and 
random  search  of  indexes,  the  inability  of  supporting  range  queries  causes  no  problem. 
The  Chained  Bucket  Hashing  requires  approximately  1.2  to  1.5  times  more  memory 
than  the  other  indexes.  We  assume  this  is  hot  significant  and  do  not  worry  about 
memory  overflow  because  indexes  are  built  on  relation  fragments  which  have  already 
been  fully  reduced  before  being  transmitted.  We  thus  choose  to  use  the  Chained 
Bucket  Hashing  index. 

Figure  4.14  shows  the  structure  of  a  chained  bucket  hashing  index  used  in  our 
work.  It  is  configured  of  a  bucket  header  table  and  chained  buckets  linked  to  each 
header.  Note  that,  unlike  the  chained  bucket  hashing  of  Figure  4.7,  each  bucket 
header  and  chained  bucket  contains  a  pointer  to  a  tuple  instead  of  an  actual  tuple 
[76,  77].  Storing  pointers  reduces  the  main  memory  space  allocated  for  the  hashing 
table.  Those  pointers  are  used  to  extract  attribute  values  when  needed.  We  pay  the 
price  of  additional  pointer  followings  rather  than  duplicating  the  tuples  of  relation 
fragments  in  the  buckets.  It  was  observed  in  [76]  that  the  Chained  Bucket  Hashing 
index  organized  in  this  structure  shows  the  best  storage  cost /performance  ratio  when 
the  size  of  the  bucket  header  table  is  approximately  a  half  of  the  number  of  tuples  to 
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be  indexed. 

4.3. 3. 5  Nesting  of  Relation  Fragments 

Nesting  of  retrieved  relation  fragments  is  performed  in  four  steps:  join  purge,  assembly 
planning,  index  creation,  and  navigational  join.  Figure  4.15  shows  an  example  of  the 
data  structure  in  each  step  of  nesting  relation  fragments.  The  relation  fragments  were 
obtained  from  the  same  query  (Figure  4.10b)  that  was  used  to  obtain  the  single  flat 
relation  of  Figure  4.13a. 

4.3.3. 5.1  Join  purge  In  the  join  purge  step,  we  remove  any  redundant  join  pred¬ 
icates  from  the  joins  specified  in  the  query,  leaving  only  the  minimal  number  of  joins. 
A  conjunction  of  join  predicates  in  a  query  can  be  reduced  to  a  single  join  predicate 
by  choosing  one  of  them  arbitrarily.  This  reduction  does  not  affect  the  result  of  the 
nesting  step.  The  following  theorem  shows  it. 

Theorem  4.3.1  Let  us  consider  a  conjunctive  join  predicate  AidiBi  A  A2O2B2  A 
•  •  •  A  AnSnPn  between  two  relation  fragments  Fi  and  F2  that  have  been  retrieved 
from  a  server  in  the  RF  method.  Then,  for  an  arbitrary  pair  of  tuples  where 

€  Fi,  t2  €  F2, 


{t1.A161t2.B1)  A  {t1.A2S2t2.B2)  A  •  •  •  A  {t1.An6j1t2.Bn)  (4-4) 

if  and  only  if 

t1.Ai6it2.Bi  for  some  i  G  [l,n]  (4-5) 

Proof:  Since  the  ‘only  if’  part  is  obvious,  we  prove  only  the  ‘if’  part. 

If  part:  Let  us  assume  Equation  4.4  is  not  satisfied  although  Equation  4.5  is  satisfied. 
Then,  there  exists  at  least  one  j  €  [1,  n]  such  that  j  ^  i  and  -‘{t1.Aj6jt2.Bj).  However, 
if  t1.Aj6jt2.Bj  is  false,  ti  0  Fi  if  <2  ^  F2  and  *2  ^  F2  if  ti  ^  Fi  by  the  definition  of 
join.  It  contradicts  with  the  given  assumption  that  ti  €  Fi  and  <2  €  F2.  Q.E.D. 

It  will  be  good  in  practice  to  select  the  join  predicate  which  takes  the  minimum 
computation  time,  such  as  an  equijoin  on  integer  attributes. 
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(c)  After  navigational  join  (nesting  completes) 

Figure  4.15:  An  example  of  nesting  a  set  of  relation  fragments 
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TT :  projection  :  join 


Figure  4.16:  An  example  of  an  assembly  plan 


Since  we  axe  dealing  with  an  acyclic  query,  the  join  graph  of  a  query  is  always  a 
tree  rooted  by  the  pivot  relation  fragment.  From  now  on,  we  use  the  term  join  tree 
interchangeably  with  join  graph. 


4.3.3.5.2  Assembly  planning  In  this  step,  a  plan  of  how  to  assemble  the  tuples 
which  will  be  collected  from  the  navigational  joins  is  set  up.  An  assembly  plan  is  a 
transformation  from  the  nodes  of  a  join  tree  to  the  nodes  of  a  nesting  format  tree. 
Figure  4,16  contrasts  the  join  tree  and  the  nesting  format  tree  for  the  object  and  view 
shown  in  Figure  4.10.  As  illustrated  in  Figure  4.16,  a  node  of  a  nesting  format  tree  is 
obtained  from  one  or  more  nodes  of  a  join  tree  via  relational  projections  and  joins.  A 
node  of  a  join  tree  represents  a  relation  fragment  while  a  node  of  a  nesting  format  tree 
represents  a  nested  subrelation  of  a  single  nested  relation.  Joins  are  needed  only  if  a 
node  of  the  nesting  format  tree  has  a  schema  which  is  not  a  subset  of  the  schema  of 
any  relation  fragment  but  is  split  into  the  schemas  of  two  or  more  relation  fragments. 
For  example,  the  IJ  node  is  split  into  two  relation  fragments  A^IL'  and  LJ.  A  join  is 
needed  to  merge  the  relation  fragments  A^IL^  and  LJ  into  the  nested  subrelation  IJ. 
Projections  are  used  to  remove  the  extra  join  attributes.  Note  that  there  must  exist 
one  and  only  one  matching  tuple  of  LJ  for  each  tuple  of  A^IL^  Otherwise  I  and  J 
cannot  belong  to  the  same  node  of  the  nesting  format  tree. 
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In  the  following  discussion,  we  denote  a  node  of  a  join  tree  as  n,-  and  a  node  of 
nesting  format  tree  as  Uj.  Note  there  is  a  one-to-one  mapping  between  the  set  of 
relation  fragments  {Fj}  and  the  set  of  the  nodes  {ui}  of  a  join  tree,  and  between  the 
set  of  nested  subrelations  {Si}  and  the  set  of  the  nodes  {ui}  of  a  nesting  format  tree. 
Let  us  introduce  here  two  functions  defining  these  one-to-one  mappings  —  RFJT  from 
{Fj}  to  {ui}  and  NSRNFT  from  {Sj}  to  {uj}  -  for  later  use. 

An  assembly  plan  is  defined  as  a  set  of  expressions  of  the  following  form. 

u  :=  nu(t;i  Xu2  •  •  •  tXufc) 

where  n„  denotes  the  projection  on  the  schema  of  u.  For  example,  we  obtain  the 
following  assembly  plan  from  the  join  tree  and  nesting  format  tree  of  Figure  4.16. 

Example  4.3.1  (Assembly  plan) 

KA  :=  IIkaKAD' 

DE  :=  IIdeDG^E 
HG  :=  HG 

IJ  :=  nij(A'lL^  X  LJ) 

L'flL 

□ 

In  a  more  abstract  form,  we  consider  an  assembly  plan  AP  as  a  function  of  the  nodes 
of  a  join  tree  (or  equivalently,  relation  fragments),  i.e.,  u  =  AP(vi,t;2,  •  ■  •  ,Vk).  We  use 
the  same  function  AP  for  both  the  schema  of  the  nodes  of  the  join  tree  and  nesting 
format  tree  and  their  instanpe  tuples.  For  example,  AP(A'1L',  LJ)  returns  IJ  and 

AP(a3i2Z3,/3j2)  returns  22^2 • 

The  algorithm  for  generating  an  assembly  plan  is  as  follows. 

Algorithm  4.3.10  (Assembly  planning) 

Input:  a  join  tree  (JT),  a  nesting  format  tree  (NFT). 

Output:  an  assembly  plan  (AP). 

Procedure; 
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1.  For  each  node  v  newly  visited  (not  marked  as  ‘visited’)  while  traversing  JT 
starting  from  the  root, 

(a)  Find  a  node  u  in  NFT  whose  schema  is  a  subset  of  the  schema  of  v. 

(b)  If  a  node  u  is  found  then 

i.  H  u  =  V  then  add  u  :=  v  to  AP 
else  add  u  :=  n„t>  to  AP 

ii.  Mark  v  as  ‘visited’, 
else 

i.  Find  the  set  of  nodes  {y  =  Vi,V2,  •  •  •  ,Vk}  of  a  minimal  subtree  of  JT 
rooted  by  v  such  that  the  union  of  the  schema  ofvijV^,  -  •  •  ,Vk  contains 
the  schema  of  u. 

ii.  Add  u  :=  n„(ni  X'y2  •  •  •  Xn*.)  to  the  AP. 

iii.  Mark  ,  ^2,  •  •  ■ ,  r*  as  ‘visited’. 

4.3.3. 5.3  Index  creation  Once  redundant  joins  axe  removed,  indexes  are  created 
on  the  join  attribute  of  each  relation  fragment  except  the  pivot  relation  fragment.  For 
example,  given  the  relation  fragments  of  Figure  4.15,  indexes  are  created  on  DG'E.D, 
HG.G,  A'lL'.A,  and  LJ.L.  KAD'  is  the  pivot  relation  fragment.  Index  creation  can  start 
only  when  the  entire  tuples  of  a  relation  fragment  are  available  because  a  hashing 
index  requires  the  number  of  indexed  tuples  to  be  known  before  an  index  is  created. 
Since  the  tuples  of  relation  fragments  are  transmitted  in  row- wise  order,  i.e.,  different 
tuples  from  different  relation  fragments  are  intermixed,  the  index  creation  on  relation 
fragments  can  start  oidy  after  all  relation  fragments  are  received.  Given  the  structure 
of  the  chained  bucket  hashing  index  described  in  Section  4.3. 3.4,  the  algorithm  of 
creating  an  index  is  as  follows. 

Algorithm  4.3.11  (Index  creation  using  hashing) 

Input;  a  relation  fragment  Fi,  and  a  join  attribute  Ai  of  Fi. 

Output:  a  chained  bucket  hashing  index  on  the  attribute  Ai  of  F^. 

Procedure: 
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1.  Allocate  a  bucket  header  table. 

2.  Scan  the  column  A,-  of  Fi  linearly. 

For  each  scanned  value  of  Fi.Aj, 

(a)  Compute  the  hashed  address  h[Fi.Ai)  where  is  a  hashing  function. 

(b)  Insert  the  value  of  Fi.Ai  into  the  hashing  index  at  the  address  h{Fi.Ai). 
/*  No  duphcate  checking  is  done.  */ 

4. 3. 3. 5. 4  Navigational  Join  Once  indexes  are  created  and  an  assembly  plan  is 
generated,  we  perform  navigational  joins  on  the  relation  fragments.  The  navigational 
join  starts  from  each  tuple  of  the  pivot  relation  fragment  and  follows  the  joins  of 
the  join  tree  to  find  matching  tuples  from  all  relation  fragments.  For  example  in 
Figure  4.15,  we  perform  joins  starting  from  each  tuple  of  KAD'  and  find  matching 
tuples  from  DG'E  and  A'lL'  respectively.  Then,  for  each  matching  tuple  of  DG'E  and 
A'lL'  found  in  previous  joins,  matching  tuples  are  found  from  HG  and  LJ  respectively. 
Note  that  there  always  exist  one  or  more  matching  tuples  because  non-matching 
tuples  have  already  been  discarded  in  the  materialization  phase. 

The  set  of  matching  tuples  thus  found  are  assembled  into  a  nested  tuple  according 
to  the  assembly  plan  generated  by  Algorithm  4.3.10.  For  example,  starting  from  the 
third  tuple  of  KAD^,  [k^asds],  the  following  set  of  matching  tuples  are  found  from  each 
relation  fragment  as  the  result  of  navigation. 

Example  4.3.2  (Matching  tuples  [kaasds]) 

•  [fcacads]  from  KAD'. 

•  from  DG'E. 

•  [kigi],  [^5^i]  from  HG. 

•  [“3*2^  from  A'lL'. 

•  [/3i2],  [l3;4]  from  LJ. 

□ 
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These  tuples  are  assembled  into  one  nested  tuple  in  the  last  row  of  Figure  4.15c.  Given 
the  assembly  plan  shown  in  Example  4.3.1,  [/eaCads]  is  projected  on  the  projection 
set  KA,  [^5^164]  is  projected  on  DE.  [hipi]  and  [hgi]  are  not  projected  because  their 
projection  set  is  the  same  as  the  schema  of  the  relation  fragment  HG.  [aaia^s]  from  A'lL' 
is  merged  with  [/ai2]  and  [/ai4]  from  LJ  respectively,  and  projected  on  IJ  to  produce 
[iaia]  and  [12^4]-  Duplicate  checking  is  required  when  the  tuples  are  assembled  into 
a  nested  tuple  because  projections  may  produce  duphcate  subtuples.  Carrying  out 
navigational  joins  in  this  way  for  all  tuples  of  the  pivot  relation  fragment,  we  obtain 
the  nested  relation  shown  in  Figure  4.15c. 

The  following  algorithm  describes  the  procedure  of  a  navigational  join  more  rig¬ 
orously. 

Algorithm  4.3.12  (Navigational  join) 

Input:  relation  fragments  Fi,!  =  1, 2,  •  •  •  ,n(Fi-  is  the  pivot  relation  fragment.);  in¬ 
dexes  on  the  join  attributes  of  Fi,i  =  2,3,  •  •  •  ,n;  a  join  tree  (JT),  a  nesting  format 
tree  (NFT),  and  an  assembly  plan  (AP). 

Output:  a  single  nested  relation  (SNR). 

Procedure: 

1.  Allocate  an  empty  single  nested  relation  SNR. 

2.  Wp  :=  the  root  of  the  empty  SNR. 

3.  Up  :=  the  root  of  NFT. 

4.  For  each  tp  €.  Fi,  /*  Fi  is  the  pivot  relation  fragment  */. 

Assemble(tt)p,  tip,  tp). 

Assemble(7i;p,  tip,  tp)  starts  navigation  from  tp  and  collects  the  set  of  tuples  € 
Fi,i  =  2, 3,---,n},  that  satisfy  the  join  conditions  among  Fi,  F2,  •  •  • ,  F„.  Then,  it 
inserts  tp  and  the  collected  set  of  matching  tuples  into  a  single  nested  relation.  For 
each  insertion,  it  first  finds,  from  the  assembly  plan,  the  set  of  relation  fragments  that 
are  to  be  merged  to  produce  the  tuple  to  be  inserted  and  their  associated  assembly 
plan  expression.  Secondly,  the  assembly  plan  expression  is  executed  on  those  tuples  to 
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be  merged.  Only  projection  is  performed  if  no  merging  is  prescribed  in  the  assembly 
plan  expression.  Thirdly,  the  resulting  tuples  are  inserted  to  corresponding  nested 
subrelations.  Every  insertion  is  preceded  by  a  searching  for  checking  if  there  already 
exists  a  duplicate  tuple. 

Algorithm  4.3.13  (Assemble_RF) 

Input:  A  node  (w.)  of  the  SNR,  a  node  (n.)  of  the  NET,  a  join  tree  (JT),  an  assembly 
plan  (AP),  and  a  tuple  to  from  which  we  start  navigation. 

Output:  SNR  with  newly  inserted  tuples. 

Procedure: 

1.  Wr  :=  the  node  pointed  by  Wi.Ui. 

/*  Wr  is  the  root  of  the  binary  search  tree  of  the  nested  subrelation  to  be 
searched.  */ 

2.  Find  V  =  {vi,v2,-  ■  ■  ,Vk}  from  AP  such  that  Ui  =  AP(ui,U2,  •••  ,Vk)  . 

/*  A:  >  1  if  and  only  if  a  merging  is  required.  * / 

3.  /*  Let  Ft  be  RFJT  for  Vj  €  V,  and  let  $,•  be  the  join  predicate  between 
Ft  and  Fj  where  RFJT(Fj)  is  the  parent  of  RFJT(Fi)  in  JT.  */ 

/*  Fmd  the  tuples  from  Fi,F2,  ■■■,Fk  that  match  U.  */ 

For  each  ti  G  Match(io,  Fi,  #i), 

For  each  E  Match(<i,  Fa,  #2), 

For  each  4  G  Match(tfc_i,  F*,  #*), 

(a)  tc  :=  AP(ti,t2,  •  •  •  /*  Execute  the  assembly  plan.  */ 

(b)  If  (wc  :=  Search(u;„t,))  =  NOTJOUND 

then  Wc  :=  Insert(u;,-, 

(c)  :=  the  set  of  ti,’s  children  in  NFT. 

(d)  If  $  =  {}  then  return 
else  For  each  Uc  E  9, 

Assemble(wc,Uc,  tc). 
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where  Search  and  Insert  are  the  same  algorithms  as  Algorithm  4.3.7  and  Algo¬ 
rithm  4.3.9,  respectively.  No  duphcate  checking  is  performed  if  no  projection  is 
required  in  the  assembly  plan,  although  it  is  not  exphcitly  shown  in  Algorithm  4.3.13. 

Given  a  tuple  U  €  Fi,  the  matching  tuples  in  another  relation  fragment  Fj,  which 
is  connected  to  Fi  through  a  join  condition  AiBAj,  are  collected  as  follows. 

Algorithm  4.3.14  (Match) 

Input:  a  tuple  ti  ^  Fi,  a.  relation  fragment  Fj,  and  a  join  condition  U.Adtj.B  where 
tj  G  Fj. 

Output:  {tj\tj  €  Fj,ti.ABtj.B}. 

Procedure: 

1.  Compute  the  address  of  a  bucket  header  using  ti.A  as  the  hashing  key. 

2.  For  each  bucket  from  the  bucket  header  through  the  end  of  the  chain, 

(a)  If  U.ABtj.B  is  satisfied  then  collect  the  pointer  to  tj,  where  tj  G  Fj  is 
a  tuple  pointed  by  the  bucket  entry.  (Remember  that  each  bucket  entry 
contains  a  pointer  to  a  tuple.) 

If  the  Match  process  and  its  subsequent  process  (the  execution  of  AP,  searching  for 
checking  duphcates,  and  insertion  to  SNR)  are  pipehned,  the  pointer  to  tj  is  not 
collected  but  passed  to  projection  operator  to  compute  tj.TTj. 

4.3.3. 5.5  Summary  In  summary,  the  nesting  of  relation  fragments  into  a  single 
nested  relation  is  performed  as  follows. 

Algorithm  4.3.15  (RF  nesting) 

Input:  a  set  of  relation  fragments,  a  nesting  format  tree,  and  a  join  tree. 

Output:  a  single  nested  relation. 

Procedure: 

1.  Purge  the  joins  of  the  join  tree  by  removing  all  join  predicates  from  conjunctive 
join  predicates  except  one  arbitrarily  selected  join. 
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2.  Generate  an  assembly  plan  by  comparing  the  join  tree  and  the  nesting  format 
tree. 

3.  Create  indexes  on  the  join  attribute  of  each  relation  fragment  except  the  pivot 
relation  fragment. 

4.  For  each  tuple  of  the  pivot  relation  fragment, 

Perform  navigation  along  the  joins  of  the  join  tree  and  find  the  set  of  match¬ 
ing  tuples  from  each  relation  fragment,  and  assemble  the  set  of  matching 
tuples  into  a  nested  tuple  according  to  the  assembly  plan. 

4.3.4  The  SNR  Method 

As  mentioned  in  Section  4.-3.-1,  the  materialization  of  a  single  nested  relation  is  per¬ 
formed  as  the  materialization  of  relation  fragments  followed  by  the  nesting  of  relation 
fragments  into  a  single  nested  relation.  Hence  it  is  sufficient  to  focus  only  on  the 
modifications  needed  to  adapt  the  RF  method  to  the  SNR  method,  that  is,  to  move 
the  nesting  step  to  a  server  and  transmit  a  single  nested  relation. 

processing  is  exactly  the  same  as  that  of  the  RF  method  and  therefore. 
Algorithm  4.3.2  can  be  used  without  modification  if  we  use  the  pipelined  nested  loop 
join  algorithm.  The  process  of  eliminating  duplicate  tuples  from  materialized  relation 
fragments  is  also  the  same  as  the  one  shown  in  Algorithm  4.3.3  except  that  tuples 
are  written  to  an  output  buffer  instead  of  being  transmitted  to  a  client. 

Once  the  tuples  of  the  relation  fragments  are  collected,  they  are  restructured 
into  a  single  nested  relation  on  a  server.  The  same  steps  as  those  of  the  RF  nesting 
described  in  Section  4.3. 3. 5  are  used  except  that  the  navigational  join  step  is  modified 
so  that  matching  tuples  are  not  only  assembled  into  nested  tuples  but  also  transmitted 
to  a  client.  According  to  Algorithm  4.3.13,  the  tuples  of  nested  subrelations  are 
transmitted  in  a  depth-first  search  order  of  the  nesting  format  tree.  Delimiters  are 
needed  to  distinguish  between  the  tuples  of  different  nested  subrelations.  For  example, 
the  stream  of  data  transmitted  for  the  single  nested  relation  of  Figure  4.15c  is  as 
follows.  ‘(’  and  ')’  are  delimiters. 
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Example  4.3.3 

Header:  {KA{DE{HG)){1J)) 

Data: 

{k2a2{d4e2{h2g3)){d5e4{higihsgi)){iiji)){kiai{d2e2{h^g3h4g2)){i2j2i2j4)) 

{k3a3{dse4{higih5gi)){i2j2i2j4)) 

□ 

where  {KA{DE{HG)){1J))  is  a  header  describing  the  format  of  the  data  stream 
following  the  header.  A  data  stream  consists  of  segments.  A  segment  contains  the 
tuples  that  will  belong  to  the  same  nested  subrelation  when  they  are  assembled  into 
a  single  nested  relation  by  a  chent.  Example  4.3.3  shows  three  segments  starting  with 
kiai,k2a2,  and  fcsOs,  respectively. 

What  remains  for  a  client  to  complete  the  nesting  process  is  to  parse  the  received 
data  stream  and  assemble  the  extracted  tuples  into  a  single  nested  relation.  Algo¬ 
rithm  4.3.16  describes  the  assembly  process.  For  each  tuple  t,-  read  from  the  data 
stream,  U  is  inserted  as  a  nested  subtuple  of  the  previous  tuple  if  U  is  preceded  by 
‘(’.  Otherwise,  U  is  inserted  in  the  same  nested  subrelation  as  the  previous  tuple.  In 
the  following  algorithm,  w„  is  the  current  insertion  entry  node  and  is  the  latest 
inserted  node.  The  current  insertion  entry  node  is  moved  one  level  up  for  each  We 
assume  the  availabihty  of  a  function  named  ‘Super’  which  returns  the  (super)node 
for  which  the  node  Wp  is  a  nested  subtuple.  For  example,  if  Wp  is  the  node  containing 
the  tuple  (^462  in  the  single  nested  relation  of  Figure  4.12,  then  Super(tt;p)  returns  the 
node  containing  the  tuple  k2a2. 

Algorithm  4.3.16  (Assemble_SNR) 

Input:  formatted  stream  of  tuples  of  a  single  nested  relation,  nesting  format  tree 
(NFT). 

Output:  an  assembled  single  nested  relation. 

Procedure: 

1.  Allocate  an  empty  single  nested  relation  (SNR). 

2.  Wo  ■=  the  root  of  the  empty  SNR. 
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3.  For  each  data  item  d  read  from  the  data  stream, 

•  If  d  =  ‘(’  then  Wp  :=  Wo- 

•  If  d  =  tj  (a  tuple)  then 

Find  the  schema  Si  of  ti  from  the  header. 

Up  :=  NSRNFT(5i). 

Wo  :=  Insert(u;p,Up,ti). 

•  If  d  =  ')’  then  Wo  :=  Wp,  Wp  :=  Super(u;p). 

where  the  process  of  the  Insert  is  shown  in  Algorithm  4.3.9.  Note  we  do  not  need 
searching  preceding  an  insertion  because  duplicates  have  already  been  eliminated  on 
a  server. 

Summarizing,  the  object  instantiation  process  of  the  SNR  method  is  executed  in 
the  following  steps. 

•  Materialization:  Query  processing,  duplicate  elimination,  join  purge,  assembly 
planning,  index  creation,  and  navigational  join  (and  transmission). 

•  Translation:  assembly  smd  reference  resolution. 

4.3.5  Data  Transmitted  in  Different  Methods 

In  the  transmission  phase,  data  prepared  by  a  server  is  transmitted  to  the  client 
which  sent  a  query  to  the  server.  As  mentioned  in  Section  4.3.2,  transmission  oc¬ 
curs  pipelined  with  the  materialization  process.  That  is,  tuples  of  the  materialized 
query  result  are  transmitted  zis  soon  as  they  become  available.  As  discussed  in  Sec¬ 
tion  4.3. 1.2,  the  structure  of  transmitted  data  differs  for  each  of  the  three  object  in¬ 
stantiation  methods,  and  have  different  set  of  redundant  data.  The  RF  method  and 
the  SNR  method  obviously  remove  redundaint  data  transmitted  in  the  SFR  method. 
However,  they  still  have  their  own  source  of  redundant  data. 

4.3. 5.1  Sources  of  Redundant  Data  in  the  RF  and  SNR  Methods 

In  the  RF  method,  some  relation  fragments  contain  extra  join  attributes.  For  example, 
the  relation  fragment  A'lL'  shown  in  Figure  4.15a  contains  two  join  attributes  A' 
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and  L^  As  illustrated  in  Figure  4.6,  the  number  of  tuples  emitted  for  each  relation 
fragment  by  the  query  processor  is  never  larger  than  that  for  a  single  flat  relation. 
Besides,  if  a  relation  fragment  contains  no  extra  join  attribute,  it  is  guaranteed  that 
the  duplicate  elimination  step  eliminates  more  tuples  from  the  relation  fragment  than 
from  the  single  flat  relation  that  would  be  retrieved  by  the  SFR  method  for  the  same 
query.  In  other  words,  the  relation  fragment  does  not  contain  more  tuples  than  the 
corresponding  single  flat  relation.  An  exception  may  occur  if  a  relation  fragment  does 
have  extra  join  attributes  whose  combined  domain  cardinality  (the  number  of  distinct 
values)  is  higher  than  the  combined  domain  cardinality  of  the  other  attributes.  In 
that  case,  it  may  happen  that  less  tuples  are  eliminated  from  the  relation  fragment. 
We  anticipate  that  this  situation  happens  rarely. 

In  the  SNR  method,  a  server  transmits  a  linearized  stream  of  nested  tuples.  The 
stream  of  data  contains  no  duplicate  subtuples  unlike  the  SFR  method®,  and  no  ex¬ 
tra  join  attribute  unlike  the  RF  method.  However,  a  subtuple  is  transmitted  multiple 
times  if  it  belongs  to  multiple  tuples  of  the  transmitted  nested  relation.  This  phe¬ 
nomenon  occurs  when  there  is  a  many-to-many  cardinality  relationships  between  two 
joined  relations.  For  example  in  the  formatted  stream  of  Example  4.3.3,  the  tuple 
h^gz  of  the  subrelation  HG  appears  as  the  subtuples  of  two  different  tuples  of  DE. 
Likewise,  h^g-i  and  h^gi  also  appear  twice  in  different  tuples  of  the  nested  relation. 

4.3. 5. 2  Trade-offs  between  Different  Methods 

It  is  certain  that  the  method  which  incurs  the  minimum  transmission  cost  is  the  one 
that  produces  the  least  amount  of  redundant  data.  We  observe  that  the  RF  method 
has  a  trade-off  with  the  SFR  method  depending  on  which  is  larger  between  the  amount 
of  redundant  data  eliminated  by  the  fragmented  materialization  of  query  result  and 
the  amount  of  redundant  data  introduced  due  to  the  extra  join  attributes.  Besides, 
there  is  a  trade-off  between  the  RF  method  and  the  SNR  method  depending  on  which 
is  larger  between  the  overhead  of  the  extra  join  attributes  in  the  RF  method  and  the 
overhead  of  the  midtiple  subtuple  occurrences  in  the  SNR  method.  On  the  other 
hand,  the  SNR  method  alw’ays  transmits  less  amount  of  data  than  the  SFR  method 
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because  a  single  nested  relation  cannot  have  more  tuples  than  its  corresponding  single 
flat  relation. 

We  anticipate  that,  in  practical  cases,  the  transmission  costs  of  the  RF  method  and 
the  SNR  method  are  comparable,  and  either  method  can  be  more  efficient  depending 
on  the  query.  Besides,  we  anticipate  that  the  redundancy  due  to  the  extra  join 
attributes  in  the  RF  method  is  insigniflcant  compared  to  the  redundancy  of  the 
dupHcate  subtuples  in  the  SFR  method. 


4.4  Development  of  a  Cost  Model 

4.4.1  A  Platform  for  Cost  Modeling 

In  this  section,  we  develop  cost  formulas  for  each  step  of  the  three  object  instantiation 
methods.  It  is  a  too  compHcated  task  to  obtain  a  cost  model  of  main  memory-resident 
operations  [85]  if  it  is  ever  possible.  As  for  the  cost  model  of  disk-based  operations,  it 
is  sufficient  to  count  the  number  of  page  reads/writes  in  accessing  disks,  or  including 
the  cost  of  buffer  management  together  if  higher  precision  is  needed.  However,  the 
cost  of  main  memory-resident  operations  depends  on  so  many  factors  such  as  the 
hardware  used,  programming  language,  programming  style,  and  system  load.  Our 
purpose  is  to  compare  the  costs  of  different  object  instantiation  methods,  rather  than 
to  estimate  the  costs.  In  other  words,  our  concern  is  to  find  out  which  method  among 
the  SFR,  RF,  and  SNR  methods  is  the  winner  given  a  couple  of  different  situations 
within  our  range  of  interest. 

Thus,  we  make  necessztry  approximations  and  simplifications  in  the  forthcoming 
cost  modeling  and  cost  comparison.  First,  as  mentioned  in  Section  4.2.4,  the  cost 
items  that  are  common  to  all  three  methods  are  excluded  from  consideration.  More 
specifically,  we  omit  the  query  processing  cost  from  the  materialization  phase  and  the 
reference  resolution  cost  from  the  translation  phase.  Secondly,  we  exclude  the  cost 
of  accessing  schema  information  from  our  cost  models  and  consider  only  the  cost  of 
operations  on  data  tuples.  Schema  access  cost  becomes  negligible  when  the  number 
of  manipulated  tuples  becomes  large  enough.  Thirdly,  we  ignore  the  effect  of  the 
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difference  in  the  speed  of  a  server  and  a  client.  Even  if  their  speeds  are  noticeably 
different,  its  effect  on  the  cost  comparison  result  is  minimal  in  the  environment  where 
the  network  communication  cost  is  significant. 

We  consider  only  complex  queries  -  queries  with  one  or  more  joins  -  to  develop  our 
cost  model.  In  other  words,  we  consider  only  the  case  of  n/  >  1  in  our  cost  model, 
where  ny  is  the  number  of  relation  fragments.  The  SFR,  RF,  and  SNR  methods 
become  identical  if  a  query  is  a  simple  query,  i.e,,  has  no  join.  That  is,  the  base 
relation  specified  in  a  simple  query  is  reduced  to  a  selected  and  projected  fragment, 
transmitted  to  a  client,  and  linked  to  other  objects  through  reference  resolution  step. 
Nesting  step  is  not  needed  for  the  single  fragment. 

4.4. 1.1  Cost  Parameters 

Table  4.1  and  Table  4.2  show  the  values  of  cost  parameters  for  elementary  main 
memory  operations  and  network  communications,  respectively,  that  are  used  in  our 
cost  formulas.  We  have  experimented  with  both  the  CPU  time  and  elapsed  time  for 
measuring  main  memory  cost  parameters.  Our  experiment  showed  that  the  elapsed 
time  varied  significantly  at  every  run  depending  on  the  system  load.  On  the  other 
hand,  the  CPU  time  was  measured  to  be  stable.  Therefore,  we  chose  the  CPU  time 
as  an  appropriate  measure  of  the  main  memorj*^  execution  time.  As  for  the  network 
communication  time,  CPU  time  did  not  show  any  noticeable  difference  between  LAN 
and  WAN  while  elapsed  time  did  show  a  big  (20  times)  difference.  Our  experiment 
showed  that  most  of  the  elapsed  time  for  a  WAN  was  spent  on  the  communication 
network  which  carries  the  data.  Local  processes  were  blocked  until  data  arrives.  On 
the  other  hand,  we  verified  that  most  of  the  elapsed  time  for  a  LAN  wq.s  spent  on 
local  hosts  for  sending  and  receiving  data.  These  facts  lead  us  to  the  conclusion  that 
the  elapsed  time  was  more  appropriate  for  measuring  communication  cost  parameters. 
Thus,  we  used  different  measures  for  the  main  memory  cost  parameters  and  commu¬ 
nication  cost  parameters.  Appendix  A  explains  how  the  values  of  the  cost  parameters 
were  obtained.  Precisely  speaking,  the  values  of  the  cost  parameters  of  main  memory 
operations  axe  different  on  a  server  and  a  client.  Nevertheless  we  use  the  same  cost 
values  for  both  the  server  and  the  client  as  an  approximation. 
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Parameter 

Description 

Value 

Ci, 

The  cost  of  elementary  binary  search  operation 
(compare  and  move  left  or  right). 

19  /isec 

Corn 

The  cost  of  comparing  two  tuples. 

9.2  /isec 

Cci 

The  initial  cost  of  copying  a  tuple. 

11  /xsec 

Ccb 

The  per-byte  cost  of  cop3dng  a  tuple. 

0.17  /isec/byte 

C. 

The  cost  of  evaluating  a  join  predicate  (equijoin  on 
attributes  of  type  integer). 

16  /isec 

Cfi 

The  per-byte  cost  of  folding  a  tuple  into  an  integer. 

0.92  ^sec/byte 

Che 

The  cost  of  hashed  address  computation  using  an 
integer  hashing  key. 

9.5  yLisec 

Cma 

The  cost  of  allocating  memory  within  workspace. 

1.2  /isec 

Cmp 

The  cost  of  moving  (reading  or  writing)  a  pointer. 

0.88  /isec 

Cpi 

The  initial  cost  of  performing  a  projection  on  a  tu¬ 
ple. 

4.3;  /isec 

Cpb 

The  per-byte  cost  of  performing  a  projection  on  a 
tuple. 

1.1  /isec/byte 

C,i 

The  initial  cost  of  computing  an  integer  hashing  key 
from  a  scanned  relation  column. 

17  /isec 

The  per-tuple  cost  of  computing  an  integer  hashing 
key  from  a  scanned  relation  column. 

14  ^sec/tuple 

Table  4.1;  Main  memory  cost  parameters  (CPU  time) 


Parameter 

Description 

Value 

LAN 

WAN 

Cl 

Cb 

The  latency  of  sending  a  message. 

The  per-byte  data  transmission  cost. 

2.5  msec 

3.4  ^sec/byte 

53  msec 

60  /zsec/byte 

Table  4.2:  Communication  cost  parameters  (elapsed  time) 
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SFR(T’) 

Parameter 

Description 

Nt 

The  cardinality  after  duphcate  elimination. 

dt 

The  ratio  between  the  cardinality  after  duplicate  elimination  and  the 
cardinality  before  duphcate  ehmination.  {dt  <  1.) 

Ti 

Tuple  size. 

RF  {Fi,i=  1, 2,  •  •  • ,  n/  where  Fi  is  the  pivot  relation  fragment) 

Uf 

The  number  of  relation  fragments. 

The  cardinahty  of  Fi  after  duphcate  ehmination. 

du 

The  ratio  between  the  cardinahty  after  duphcate  ehmination  and  the 
cardinahty  before  duphcate  ehmination.  [df.  <  1.) 

The  domain  cardinahty  -  the  number  of  distinct  values  -  of  the  join 
column  of  Fj  for  the  join  between  and  Fj. 

Tu 

The  tuple  size  of  Fi. 

PSi 

The  extra  join  attribute  (EJA)  ratio,  i.e.,  the  ratio  between  the  size 
of  extra  join  attributes  in  Fi  and  the  tuple  size  of  Fj.  {pf^  <  1) 

SNR  (5i,i  =  1,2,  •  •  •  ,71*  where  Si  is  the  pivot  nested  subrelation) 

Ua 

The  number  of  nested  subrelations  in  a  single  nested  relation. 

N,, 

The  caidinahty  of  Si. 

The  tuple  size  of  Si. 

Table  4.3:  Data  Parameters 


We  use  the  following  short-hand  notations  in  our  cost  formulas. 


O coltean{,N) 

Oeopy{T) 


CprojectiT) 


Cai  A  CanN  for  Scanning  N  tuples.  (4-6) 

Cci  +  Ca,T  for  copjring  a  tuple  of  size  T  bytes.  (4.7) 

Opt  +  CpbT  for  projecting  a  subtuple  of  size  T  bytes  out  of  a  tT:(^6.) 


4.4. 1.2  Data  Parameters 

The  paramaters  of  the  data  transmitted  in  different  object  instantiation  methods 
are  shown  in  Table  4.3. 
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4.4. 1.2.1  Alternative  Parameters:  a,j  and  fiij  We  define  Qjj  as  the  domain 
selectivity  of  the  join  column  of  a  relation  fragment  Pj,  i.e.,  the  average  number  of 
tuples  with  the  same  value  of  a  join  column  (after  duphcate  elimination),  for  the  join 
between  Fi  and  Fj.  Thus,  the  value  of  a,j  is  related  to  Nj.  and  Dj^.  as  follows. 


a,,  =  ^  (4.9) 

Since  relation  fragments  are  fully  reduced  before  they  are  joined  again  on  a  chent, 
two  joined  relation  fragments  F,  and  Fj  have  the  same  domain  cardinality  of  their 
join  columns,  i.e.,  =  Df^..  Hence,  aij  can  be  interpreted  to  denote  the  average 

number  of  matching  tuples  in  Fj  for  each  tuple  of  Fi.  We  call  a,j  as  selectivity  from 
Fi  to  Fj.  Since  Df■^  =  Dj.j,  the  following  is  always  true. 


(4.10) 


where  the  equality  holds  if  and  only  if  =  Nj^ ,  that  is,  all  values  in  the  join  column 
of  Fi  for  the  join  to  Fi  are  unique. 

is  defined  as  the  average  degree  of  nesting,  i.e.,  the  average  number  of  tuples 
in  Sj  for  each  tuple  of  Si  where  Sj  is  a  direct  nested  subrelation  of  Si.  Given  5,-  and 
its  nested  subrelation  Sj,  fiij  can  be  interpreted  as  the  ratio  between  Nt^  and 


(4.11) 


^0-  >  1  since  we  are  considering  only  inner  joins. 

Figure  4.17  contrasts  a^j’s  and  in  accordance  with  the  join  tree  and  the 
nesting  format  tree  of  Figure  4.16.  {cxij}  maps  onto  Note  some  cxij^s  map  to 

the  same  Ai  if  two  or  more  relation  fragments  are  merged  to  become  a  single  nested 
subrelation.  For  those  aij’s  and  /3ij’s  that  are  mapping  counterparts,  otij  ^  fJij  in 
general. 

From  Equation  4.11,  we  can  derive  the  value  of  W,.  as  follows. 


n  for  i  =  2, 3,  •  •  • ,  n,  (4.12) 

(NSRNFT(5p).NSRNPT(S9))  €  Pu 

where  Pu  denotes  the  path  from  NSRNFT(5i)  to  NSRNFT(Si)  in  the  nesting  format 
tree.  That  is,  Ng^  is  computed  as  W,,  multiplied  by  all  /Sp^’s  between  S\  and  Sj. 
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KA  =  Si 
DE  =  52 
HG  =  53 
LJ  =  54 


4.4.1.2.2  Relationships  between  Data  Parameters  Provided  with  the  same 
query,  the  data  parameters  of  different  methods  shown  in  Table  4.3  are  not  indepen¬ 
dent  of  one  another  but  are  related  by  some  quantitative  relationships.  Let  us  now 
discuss  the  relationships  between  different  data  parameters. 

SFR  vs.  RF;  It  was  indicated  in  Section  4.2.2  that  the  cardinality  {Nt)  of  a 
single  flat  relation  increases  monotonicaJly  with  respect  to  the  cardinality  (Nf^)  of 
the  pivot  relation  fragment  multiplied  by  the  afj’s  of  all  joins  (F,-XFj’s)  in  the 
query,  independent  of  the  query  shape.  There  axe  ny  —  1  joins  among  the  relation 
fragments  after  the  purging  step  of  Algorithm  4.3.15.  Hence, 

Nt<Nf,  n  ■  “‘i 

<RFJT(Fi),RFJT(Fj))  €  S(JT) 

where  jE(JT)  denotes  the  set  of  the  edges  of  the  join  tree  JT.  (RFJT(Ft),  RFJT(Fj))  G 
jB(JT)  means  that  there  is  a  join  between  Fi  and  Fj.  The  equahty  holds  true  if  (not 
only  if)  there  is  no  extra  join  attribute  in  Fj,  x  =  1, 2,  •  •  • ,  n/. 

Since  relation  fragments  may  have  extra  join  attributes  while  a  single  flat  relation 
has  no  such  extra  attribute,  the  following  relationship  exists  between  Tt  and  Ty/s. 

T,  =  f.W-l’l.)  (414) 

t=l 

SFR  vs.  SNR:  We  can  think  of  Nt  as  the  number  of  tuples  generated  when 
we  ‘flatten’  the  nested  tuples  of  a  corresponding  single  flat  relation.  The  cardinality 
of  the  pivot  subrelation  5i  is  N^^ ,  and  a  single  tuple  of  a  nested  subrelation  Si  is 


DE  LJ 

023 

HG 

(b)  Nesting  format  tree 
Figure  4.17:  a^-  vs.  0ij 


KAD  =  Fi 
DGE  =  F2 
HG  =  F3 
AIL  =  F4 
IL  =  Fb 
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replicated  times  when  it  is  flattened  with  its  nested  subrelation  Sj.  Hence, 

Nt  =  N.,  n  Ai  (4.15) 

<NSRNPT(Si),NSRNPT(5^))  €  F(NPT) 

where  E(NFT)  denotes  the  set  of  edges  in  the  nesting  format  tree  NFT.  (NSRNFT(5i), 
NSRNFT(5j)  )  €  F?(NFT)  means  that  Sj  is  a  (direct)  nested  subrelation  of  Sj. 

The  attributes  of  the  flattened  relation  are  composed  of  the  atomic  attributes  of 
the  original  nested  relation.  Since  flattening  does  neither  add  nor  remove  any  of  the 
atomic  attributes,  we  obtain  the  following  relationship  between  the  tuple  sizes  of  the 
flat  relation  and  the  original  nested  relation. 

T,  =  Y.T.,  (4.16) 

Jfe=l 

RF  vs.  SNR:  As  mentioned  in  Section  4. 3. 3. 5. 2,  the  number  of  the  nodes  of  join 
tree  is  no  more  than  the  number  of  the  nodes  of  a  nesting  format  tree  because  two 
or  more  relation  fragments  can  be  merged  to  a  single  nested  subrelation  of  the  single 
nested  relation.  Hence, 

y 

n,  <n}  (4.17) 

The  cardinality  of  the  pivot  relation  fragment  and  the  cardinality  of  the  pivot  nested 
subrelation  axe  always  equal  because  both  contain  the  key  of  the  pivot  relation. 

Ni,=N,,  (4.18) 

4.4.2  Derivation  of  Cost  Formulas 

In  this  section,  we  develop  cost  formidas  of  each  step  of  object  instantiation.  Remem¬ 
ber  the  query  processing  cost  and  the  reference  resolution  cost  are  not  included  in 
our  partial  cost  model. 

4.4. 2.1  Duplicate  Elimination  Cost 

The  duplicate  ehmination  process  (Algorithm  4.3.3)  is  the  same  for  all  three  methods 
except  that  it  is  apphed  to  a  single  flat  relation  for  the  SFR  method,  and  to  each 
relation  fragment  for  the  RF  or  SNR  method.  Therefore,  it  is  clear  that  the  cost 
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of  the  duplicate  elimination  is  proportional  to  [Ntldt)Tt  for  the  SFR  method  and  to 
Yl^=i{NjJdj.)Tf-  for  the  RF  or  SNR  method. 

We  make  the  following  assumptions  for  the  hashing  of  tuples  which  was  described 
in  Algorithm  4.3.3. 

•  W’e  allocate  as  many  bucket  headers  as  half  of  the  cardinality  of  a  hashed  re¬ 
lation  (query  processing  output),  and  the  cardinality  of  the  hashed  relation  is 
estimated  by  a  query  optimizer. 

•  The  shift  folding  technique  [83,  84]  is  used  for  the  hashing  of  tuples.  In  this 
technique,  a  tuple  is  partitioned  into  several  parts  of  an  integer  size.  All  but  the 
last  parts  have  the  same  length.  The  parts  are  then  added  together  to  obtain 
an  integer  hashing  key. 

Given  these  assumptions,  the  cost  of  eliminating  duphcate  tuples  from  a  hashed 
relation  is  derived  as  follows.  Let  N  be  the  cardinality  of  the  relation  after  duplicate 
elimination  and  T  be  the  tuple  size  of  the  relation,  and  d  be  the  ratio  of  the  cardinality 
after  duphcate  ehmination  over  the  cardinahty  before  duphcate  ehmination  (d  <  1). 

The  allocation  of  a  bucket  header  costs  Cma-  Step  2  of  Algorithm  4.3.3  is  repeated 
N/d  times.  The  cost  of  computing  a  hashed  address  using  the  shifted  folding  technique 
is  computed  as  a  function  of  the  tuple  size  T  as  follows. 

Ct^r>Ha.h{T)=CflT^CHc  (4.19) 

Among  the  N/d  hashed  tuples,  N  tuples  are  actually  inserted  and  the  other  N(d—  N 
tuples  are  discared.  Therefore,  the  probabihty  of  a  tuple  being  inserted  is  d  and  the 
probabihty  of  being  discarded  is  1  —  d.  If  the  same  tuple  already  exists,  it  takes  the 
cost  of  traversing  average  half  of  a  bucket  chain  Cmp  +  {Nb/2){Ccm  +  C'mp)  where  Ny 
is  the  number  of  buckets  inserted  in  the  chain  so  far.  Otherwise,  it  cost  the  traversal 
of  the  entire  bucket  chain  {Cmp  +  Nb{Cem  +  Cmp)),  and  the  insertion  of  a  new  bucket 
in  the  chain  {Cma  +  Ccopy{T)  +  2C'mp). 

Nb  is  obtained  as  follows.  It  was  assumed  that  the  size  of  a  bucket  header  table 
we  allocate  is  50%  of  the  cardinality  of  the  hashed  relation.  That  is,  N/2d  headers 
are  allocated,  and  N  hashed  entries  are  eventually  inserted  into  these  headers.  If 
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N  >  N/2d,  i.e.,  >  1/2,  all  buckets  headers  are  eventually  filled,  assuming  the  hash 

function  distributes  a  hashing  key  uniformly  over  the  bucket  header  table.  In  this 
case,  the  ultimate  value  of  Nb  becomes  N/{Nl2d)  =  2d.  Otherwise,  if  d  <  1/2,  only 
N  bucket  headers  out  of  N/2d  headers  are  filled  and  the  ultimate  value  of  Nh  becomes 
1.  As  for  the  intermediate  value  of  Nb  in  the  middle  of  insertions,  we  use  half  of  the 
ultimate  value  as  an  expected  value.  Thus, 

iVfc  =  MAx(d,i)  (4.20) 

The  cost  of  transmitting  the  inserted  tuple  is  part  of  the  transmission  cost  and  is 
not  included  here.  Thus,  the  cost  of  inserting  a  hashed  tuple  into  a  chain  of  hashing 
buckets  is  computed  as  a  function  of  T  and  d  as  follows. 

Ctupirtserti^d^  T)  =  d[Cmp  +  N\f{^Ccm  H"  “f  Cma  +  C^copy(T)  -f  2Cmp)  + 

Nh 

(1  —  d){Cmp  +  ^{Cem  +  Cmp))  (^*21) 

where  the  value  of  Nh  is  computed  as  follows. 

Using  Equation  4.19  and  Equation  4.21,  the  SFR  duplicate  elimination  cost  is 
computed  as  follows. 

Csjrde  ^  Cma  4’  '^(C^tupAio«/i(Ti )  +  Ctupinscrii.^t’jTty)  (4.22) 

The  cost  of  eliminating  duplicate  tuples  from  relation  fragments  Fi^i  =  1, 2,  •  •  • ,  ny, 
is  computed  as  follows. 

Crfde  =  (  C'tuphaah  ( 3/ J  +  Ctupinaert{dfi^Tf.)))  (4.23) 

i=l 

Since  the  query  result  of  the  SNR  method  is  also  a  set  of  relation  fragments,  its 
duplicate  elimination  cost  is  the  same  as  that  of  the  RF  method  except  that  it  incurs 
the  additional  cost  of  writing  non- dupH cate  tuples  to  an  output  buffer  instead  of 
transmitting,  them  to  a  cHent  as  in  the  RF  method.  The  cost  of  writing  non-duphcate 
tuples  from  a  relation  fragment  Fi  to  an  output  buffer  is  Ccopy{Tf-)Nf..  Thus,  the 
cost  of  ehminating  dupHcate  tuples  from  Fi^i  =  1,2,  •  •  •  ,n/,  to  be  used  in  the  SNR 
method  is  computed  as  follows. 

^anrde  “ 

t=l 


(4.24) 
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4. 4.2.2  Nesting  Cost 

4.4.2.2.1  Binary  Search  Tree  Searching  and  Insertion  Costs  The  searching 
(Algorithm  4.3.7)  and  insertion  of  one  tuple  (Algorithm  4.3.9)  are  used  commonly 
for  all  three  object  instantiation  methods.  Hence,  we  deal  with  their  cost  formulas 
separately  here.  We  assume  all  binary  search  trees  implementing  nested  subrelations 
are  weU-balanced.  In  fact,  well-balanced  trees  are  common  and  degenerate  trees  are 
very  rare  [94].  Even  if  a  binary  tree  should  be  balanced  sometimes,  a  tree  balancing 
involves  only  pointer  movements  and  incurs  negligible  cost. 

Let  M  be  the  number  of  tuples  that  are  attempted  to  be  inserted  into  a  binary 
search  tree.  Every  attempt  of  insertion  requires  one  searching  to  check  if  the  same 
tuple  has  already  been  inserted  into  the  binary  search  tree.  Let  N  denote  the  number 
of  tuples  that  are  actually  inserted  into  a  binary  search  tree.  According  to  Knuth  [94], 
a  single  searching  requires  about  1.386  log2  k  comparisons  {k  is  the  number  of  nodes 
currently  in  the  binary  search  tree)  for  a  weU-balanced  binary  search  tree,  considering 
both  a  successful  search  and  an  unsuccessful  search.  If  we  assume  the  insertion  of  the 
N  tuples  out  of  M  tuples  occurs  at  a  regular  interval,  the  value  of  k  is  incremented  at 
every  M/N  insertion  attempts.  Then,  the  total  searching  cost  for  inserting  N  tuples 
out  of  the  attempted  M  tuples  is  computed  a5  foUows. 

^  M 

Cur^arcHiM,  N)  =  j:(--1.386(7fc.  logj  k)  (4.25) 

fc=:l 

Insertion  cost  is  the  sum  of  the  cost  of  an  unsuccessful  searching  and  the  cost  of 
inserting  a  node  as  a  leaf  of  the  binary  search  tree.  An  unsuccessful  searching  of  a 
binary  search  tree  requires  log2{k  +  l)  comparisons.  Node  insertion  at  the  leaf  requires 
the  allocation  of  an  empty  node  (Cmo),  copying  tuple  into  the  node  {Ccapy{T))^  and 
writing  a  pointer  to  the  node  in  the  parent  node  {Cmp)-  Thus,  the  total  cost  of 
inserting  N  tuples  to  a  binary  search  tree  is  computed  as  follows. 

Cu7Un»ert{N,  T)  =  ^(Cj,,  log2(A:  +  1)  +  Cma  +  Ccopy{T)  -j-  Cmp)  (4-26) 

fc=l 

There  will  be  tuples  inserted  into  a  nested  subrelation  Si  of  the  final  output  sin¬ 
gle  nested  relation.  Let  SpaT(i)  denote  the  nested  subrelation  such  that  NSRNFT(Spor(t)) 
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is  the  parent  of  NSRNFT(5i).  Then,  there  exist  binary  search  trees  implement¬ 

ing  the  nested  subrelation  Si,  i.e.,  one  binary  search  tree  for  each  tuple  of  Spar{iy  Let 
M,.  denote  the  number  of  tuples  that  are  attempted  to  be  inserted  into  Si.  If  we  as¬ 
sume  tuples  are  uniformly  distributed  into  every  binary  search  tree  of  Si, 
tuples  are  attempted  for  insertion  and  tuples  are  actually  inserted  into 

each  binary  search  tree  of  Si.  Thus,  the  total  cost  of  inserting  N,.  tuples  into  Si  out 
of  the  attempted  tuples  is  computed  as  follows. 

M  .  N  . 

Csiteareh{Ms^,  Nf^,  =  Ns^^^^..^Cbin,eareh{~j^  j  )  (4-27) 

‘paT(i)  **iiar(«) 

Csiintert{Nt.,T,-,  —  Nt^^^^..^Cbininterti~^ 

*jiaT(i) 

4. 4. 2. 2. 2  SFR  Nesting  Cost  We  consider  only  the  costs  of  projecting  tuples, 
searching  tuples  (Algorithm  4.3.7),  and  inserting  tuples  (Algorithm  4.3.8),  which  are 
the  operations  on  data  tuples  and  whose  costs  are  dominant. 

According  to  Algorithm  4.3.5,  Nt  tuples  are  assembled  to  a  single  nested  relation. 
Each  one  of  the  Nt  tuples  is  decomposed  into  subtuples  belonging  to  different  nested 
subrelations  Si,S2,--- ,  Sn,  by  projections.  For  each  of  the  Nt  tuples,  the  projection  of 
the  tuple  on  the  schema  of  Si  costs  CprojeetiTsi).  The  searching  of  Si  for  the  projected 
subtuple  costs  Cn»earch{Nt,  Nt-,  and  the  insertion  of  the  projected  subtuple 

into  Si  costs  C',iin*erf(A^»i,  ■^«p«,(,))-  Hcnce,  the  total  cost  of  assembling  Nt  tuples 
of  a  single  flat  relation  into  a  single  nested  relation  is  computed  as  follows. 

Cgfrnest  ^  X ^project (Ti.- )JVt  -f  Caiteareh(,Nt,  Nf^,  -j-  C',jjn,ert (IV,^ ,  7,^ ,  )) 

(4.29) 

4.4. 2.2.3  RF  Nesting  Cost  We  ignore  the  costs  of  the  join  purge  step  and  the 
assembly  planning  step  because  they  are  not  operations  on  data  tuples.  Accordingly, 
we  approximate  the  RF  nesting  cost  as  the  sum  of  the  index  creation  cost  and  the 
navigational  join  cost. 

Crfnest  ~  Cixcrt  "1“  Cnavjn  (4.30) 
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The  number  of  joins  among  the  relation  fragments  is  2ilways  one  less  than  the 
number  of  the  relation  fragments  (n/  —  1)  after  the  join  purge  step. 

Index  creation  (Algorithm  4.3.11):  For  a  relation  fragment  Fi,  the  cost  of  bucket 
header  allocation  is  Cma-  The  linear  scan  costs  Ccoi»can{Nf-).  As  for  the  hashing  of 
join  column,  we  assume  all  join  attributes  are  integers  so  that  no  folding  is  required. 
For  each  of  the  Nf  scanned  join  column  values,  hashing  computation  costs  Che  and 
insertion  to  a  hashing  bucket  chain  takes  the  cost  of  allocating  a  bucket  (Cma),  writing 
a  pointer  (C'mp)  to  the  tuple  containing  the  hashed  attribute,  and  two  pointer  writings 
{2Cmp)  to  make  connections  to  other  buckets.  Note  we  do  not  need  to  scan  the  entire 
chain  of  buckets  because  no  duplicate  checking  is  required.  Hence,  the  cost  of  creating 
n/  —  1  indexes  on  Fi.Aj’s  for  i  =  2, 3,  •  •  •  ,n/,  where  Fj  is  the  pivot  relation  fragment, 
is  computed  as  follows. 

Ci^ert  =  +  Ccol,canW  +  {Che  +  Cma  +  3Cmp)Nf,)  (4.31) 

1=2 

Navigational  join  (Algorithm  4.3.12):  The  allocation  of  an  empty  single  nested 
relation  costs  Cma-  As  for  the  assembly  cost  (Algorithm  4.3.13),  we  consider  only 
the  costs  of  following  operations  on  data  tuples:  the  cost  of  finding  matching  tuples 
(Algorithm  4.3.14),  the  cost  of  executing  assembly  plans  (AP)  on  the  found  tuples, 
and  the  cost  of  inserting  (Algorithm  4.3.9)  the  resulting  tuples  into  the  single  nested 
relation  after  checking  for  duplicate  tuples  (Algorithm  4.3.7). 

Matching  (Algorithm  4.3.14):  The  cost  of  Match(ti,  denoted  by 

Cmatehij,  is  Computed  as  follows.  First,  hashing  of  a  join  column  costs  Che-  Let  Nf, 
denote  the  expected  length  of  the  chain  of  buckets  including  the  header  bucket.  Then, 
in  Step  2,  it  costs  Nb{2Cmp  +  Ce)  to  follow  the  chain  of  buckets  -  one  Cmp  for  reading 
a  pointer  to  a  tuple  tj  €  Fj,  the  other  Cmp  for  reading  the  pointer  to  next  bucket, 
and  Ce  for  evaluating  the  join  predicate  U.ABtj.B.  cnj  tuples  of  Fj  are  collected  from 
Ma,ich.{ti,Fj,ti.A9tj.B).  The  collection  of  matching  tuples  incurs  only  the  cost  of 
writing  ajj  pointers  {CmpOHj)-  Thus,  the  cost  of  finding  matching  tuples  of  Fj  for 
each  tuple  tj  of  Fi  is  computed  as  a  function  of  otij  as  follows. 


Cmatehij  {<^ij)  —  ^he  +  Nb{2Cmp  +  Ce)  +  CmpOtij 


(4.32) 
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where  the  value  of  is  obtained  as: 

Nb  =  MAX  [Nj./Dj,., 2)  (4.33) 

=  MAX (aij,  2)  by  Equation  4.9  (4-34) 

•in  the  same  way  we  obtained  the  ultimate  value  of  for  Equation  4.20.  As  mentioned 
in  Section  4.3,3.43  assume  the  allocate  bucket  header  size  is  50%  of  the  cardinality 
of  a  hashed  relation  fragment. 

The  cost  of  the  entire  matching  process  is  the  sum  of  the  cost  of  linear  scan  on  the 
pivot  relation  fragment  {Ccohcani^ji))  and  the  cost  of  finding  matching  tuples  from 
the  other  relation  fragments. 

Cmatch  =  Ccol8can{Nf^)  +  ^  ^fiCmatchij{oLij)  (4.35) 

where  Leaf(JT)  denotes  the  set  of  the  leaves  of  the  join  tree  JT  and  Lf^  is  obtained 
as  follows. 

Lji  =  n  ^5  (4.36) 

(RPJT(l?p),RFJT(F,))  €  Pu 

where  Pi*  is  a  path  from  RFJT(Pi)  to  RFJT(Pi). 

Execution  of  assembly  plans  (Step  3a  of  Algorithm  4.3.13):  The  tuples  of  rela¬ 
tion  fragments  that  are  found  by  the  matching  process  are  merged  according  to  the 
prescription  of  the  assembly  plan.  Let  be  the  number  of  relation  fragments  whose 
tuples  are  merged  to  produce  tuples  to  be  inserted  into  a  nested  subrelation  Sj,  and 
T^pj  =  1,2,  ••  •  jTTZj,  denote  the  size  of  the  attributes  projected  from  each  one  of 
the  to-be-merged  relation  fragments.  Then,  the  following  equation  holds  true. 

mi 

(4.37) 

i=i 

The  case  of  merging  two  tuples  from  two  relation  fragments  requires  two  projections 
on  the  tuples.  Extending  from  this  case,  the  cost  of  merging  77i{  tuples  from  mi  relation 
fragments  into  the  tuple  of  a  nested  subrelation  Si  is  obtained  as  (^projeetiFf), 
This  formula  can  be  rewritten  as  a  function  of  T,.  and  mi  using  Equation  4.8  and 
Equation  4.37. 


Capextci{Ts-^mi)  —  (mi  ^  l)C'pi  -f  Cproject{Tg-) 


(4.38) 
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(a)  Join  tree 


(b)  Nesting  format  tree 


51  =  HFi  Ta  =  {Fi}  7i  =  1 

52  =  n(F2  txi  F3)  Fj  =  {^2,^3}  72  =  0:120:23 

Sz  =  IIFb  Fs  =  {Fe}  73  =  0:36 

Si  -  ILF4  F4  =  {Fi}  74  =  o:i4 

Ss  =  IIFs  Fs  =  {F5}  75  =  0:25 

(c)  Assembly  plan  (d)  F  partition  (e)  7j 


Figure  4.18:  An  example  of  F*  and  7^ 

Since  n,  nested  subrelations  are  produced  out  of  n/  relation  fragments,  n/  —  n, 
mergings  occur.  It  depends  on  a  query  to  determine  which  relation  fragments  are 
merged  to  produce  each  nested  subrelation  Si.  Let  us  consider  a  set  of  n/  —  1  ajj-’s 
that  are  defined  on  n/  relation  fragments.  We  define  a  partition  on  this  set,  i.e., 
[F1IF2I  •  •  •  IFn,]  where  each  Fa,,^  =  l,2,-*-,n,,  is  the  set  of  Fj’s  that  are  merged 
to  produce  tuples  to  be  inserted  into  a  nested  subrelation  Sk-  Let  7*  denote  the 
combined  value  of  the  ctij’s  to  the  F^’s  belonging  to  Tk  and  be  defined  as  follows. 

7fc  =  n  where  aji  =  1  (4.39) 

Figure  4.18  shows  an  example  of  F*.  and  7*.  Note  the  rrik  of  Equation  4.38  is  equal 
to  the  number  of  Fy’s  in  Ffc. 

Given  Equation  4.39,  the  total  cost  of  executing  assembly  plans  is  computed  as 
follows. 

n, 

Capexec  ~  apexeci  {Tsi,mi) 


(4.40) 
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where  My,  is  the  number  of  tuples  produced  for  5,  and  computed  as  follows. 

n  7p  (4.41) 

NSRNFT(5;,)  €  Pu 

where  Pn  is  the  path  from  NSRNFT(5'i),  which  is  the  root  of  the  nesting  format  tree, 
to  NSRNFT(5.). 

Searching  (Algorithm  4.3.7)  and  Insertion  (Algorithm  4.3.9):  The  My.  tuples  pro¬ 
duced  by  the  execution  of  assembly  plans  are  attempted  to  be  inserted  into  a  nested 
subrelation  Si.  Then,  the  searching  cost  becomes  2”=!  C'«,earc;i(My.,  A,.,  and 

the  insertion  cost  becomes  E?=i  T,.,  using  Equation  4.27  and 

Equation  4.28. 

Thus,  the  total  cost  of  performing  navigational  joins  on  relation  fragments  is 
obtained  as  follows. 

n, 

Cnavjn  —  C match  Capexee  3  ^Si  3  '^5par(t))  5  3  )) 

i=l 

(4.42) 

4«4.2.2.4  SNR  Nesting  Cost  and  Assembly  Cost  Nesting:  As  mentioned 
in  Section  4.3.4,  the  SNR  method  uses  the  same  nesting  process  on  a  server  as  the 
RF  method  except  that  the  navigational  join  process  is  modified  so  that  tuples  that 
are  inserted  into  a  single  nested  relation  are  transmitted  to  a  client  as  well.  The 
transmission  cost  is  considered  separately  in  Section  4. 4. 2. 3  and  not  considered  here. 
Since  we  ignore  the  difference  between  server  speed  and  cHent  speed,  the  SNR  nesting 
cost  is  the  same  as  the  RF  nesting  cost. 

^ anrnest  ^  ^r/nc«i  (4.43) 

Assembly  (Algorithm  4.3.16):  There  is  an  additional  cost  of  assembling  the  re¬ 
ceived  data  stream  into  a  single  nested  relation  on  a  chent.  Considering  the  cost  of 
operations  on  tuples  only,  the  cost  of  assembhng  the  received  data  stream  is  computed 
using  Equation  4,28. 

C anraaaem  “  C^tmacrt  (-N,^  ,  ) 


(4.44) 
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4. 4. 2. 3  Transmission  Cost 

We  use  a  simple  model  [64]  of  data  transmission  cost  defined  as  follows  . 

Transmission  cost  =  Ci  A  Cb  x  Size  (4.45) 

where  Size  is  the  number  of  b3’’tes  of  the  transmitted  data. 

In  the  SFR  method,  the  amount  of  transmitted  data  is  equal  to  the  size  (in  bytes) 
of  a  single  flat  relation,  i.e.,  NtTt,  and  hence,  the  transmission  cost  is  as  follows. 

C.frt.  =  Ci  +  CbNtTt  (4.46) 

On  the  other  hand,  in  the  RF  method,  the  amount  of  transmitted  data  is  the  sum  of 
the  sizes  =  1,2,  ••  •  ,n/)  of  relation  fragments. 

C,„,  =  C,  +  Ctf:Ni,T,,  (4.47) 

In  the  SNR  method,  if  we  ignore  the  size  of  the  header  and  delimiters  because  it  is 
trivial,  the  amount  of  transmitted  data  is  the  sum  of  the  sizes  {Ns-T,i,i  —  1, 2,  •  •  • ,  n,) 
of  nested  subrelations. 

C gnrtx  —  Cl  ACbf^N,,Z,  (4.48) 

t=l 

4.5  Comparison  of  Costs 

In  this  section,  we  compare  the  costs  of  the  three  different  object  instantiation  meth¬ 
ods  using  the  cost  model  developed  in  Section  4.4.2.  Table  4.4  shows  the  distribution 
of  cost  items  which  have  been  used  in  our  cost  model.  Note  Cgueryproc  and  Crefra  are 
not  part  of  our  cost  model. 

We  first  discuss  the  input  data  parameters  that  were  used  for  cost  comparison 
and  introduce  the  selectivity  (oijj)  and  EJA  ratios  (p/J  as  the  variant  input  data 
parameters.  Then,  we  present  the  results  of  cost  comparison.  We  carried  out  the 
cost  comparison  in  two  different  ways;  sample  case  test  and  simulation.  We  first 
show  the  costs  of  the  SFR,  RF,  and  SNR  methods  by  simulations  using  randomly 
generated  values  of  data  parameters.  Then,  we  compare  the  costs  using  sample  data 
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Method 

Server 

Network 

Client 

SFR 

^qucryproa  sfrde 

C'sfrtx 

^ afrnestf  Crefrcs 

RF 

Cq^ucrypToc’i  ^rfde 

Crftx 

Crfnest^  C ref  res 

SNR 

Cgucryproci  Crfdej  ^rfncst 

Cgnrtx 

^  snrassem  y  C ref  res 

Table  4.4:  Distribution  of  cost  items 

parameters  and  observe  the  dependency  of  costs  on  the  values  of  selectivities  and 
EJA  ratios.  The  observed  result  is  reinforced  by  another  round  of  simulation  using 
random  values  of  data  parameters,  this  time  with  biases  given  to  the  domains  of  the 
values  of  selectivities  and  EJA  ratios  relatively  to  the  original  domains. 

4.5.1  Input  Data  Parameters 

We  used  the  data  parameters  of  the  RF  method  as  the  base  set  of  input  data  pa¬ 
rameters  and  derived  the  values  of  the  data  paxameters  of  the  SFR  method  and  the 
SNR  method  using  the  relationships  we  have  developed  in  Section  4.4. 1.2.2.  Besides, 
based  on  our  discussion  of  the  amount  of  transmitted  data  in  Section  4.3.5,  we  have 
chosen  two  data  parameters,  the  selectivity  (oij’s)  and  the  extra  join  attribute  (EJA) 
ratio  (p/.’s),  as  the  variant  input  parameters.  The  value  of  Oij  is  an  indicator  of  the 
overhead  of  the  duplicate  subtuples  in  the  SFR  method  and  the  multiply  occurring 
subtuples  in  the  SNR  method.  The  value  of  pf.  is  an  indicator  of  the  overhead  due 
to  the  extra  join  attributes  in  the  RF  method.  The  examples  shown  in  Figure  4.19 
and  Figure  4.20  illustrate  how  the  values  of  the  selectivity  and  the  EJA  ratios  affect 
the  costs  of  the  three  methods. 

Figure  4.19a  shows  an  example  of  high  selectivities  among  three  relation  fragments. 
Let  us  assume  the  two  selectivities  are  the  same  and  equal  to  n  >  1.  Then,  the  average 
cardinality  of  a  corresponding^  single  flat  relation  is  2  x  n  x  n  where  2  is  the  cardinality 
of  the  first  relation  fragment.  The  corresponding  single  nested  relation  contains  2 
nested  tuples,  within  which  there  exist  2  x  n  x  n  subtuples  of  DE.  On  the  other  hand, 
the  selectivities  are  equal  to  1  in  Figure  4.19b.  In  that  case,  both  the  single  flat  relation 


^I.e.,  materialized  for  the  same  query  on  the  same  database 
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RfI'  RF'J' 

KA  B’  “BCD’  "DE  KA  B’  ^  B  C  D’  ^DE 


SNR>U  SFRnI'  snr^  sfr-I" 

KA  BC  DE  KA  BC  DE  KA  B  C  DE  KA  B  C  DE 


(a)  High  selectivities  (b)  Low  selectivities 


Figure  4.19:  Examples  of  high  vs.  low  values  of  selectivity 
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A  b  - b  - (J  D  - D  t 

a  ■  ■  E 


A  B  - B’C  - CD 

I — I — I  r-nr- .  — i — 

I  — I—  — I — 

I  i 


snrT  sf£ 

A  E  A  E 


snrI 

A  B  B’  C  C’D 


spd' 


(a)  High  EJA  ratios 


(b)  Low  EJA  ratios 


Figure  4.20:  Examples  of  high  vs.  low  values  of  EJA  ratios 
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(a)  Join  tree  (b)  Nesting  format  tree 


Figure  4.21:  A  sample  query  for  random  values  of  data  parzuneters 

and  single  nested  relation  contain  only  two  (=2x1x1)  tuples.  From  these  examples, 
we  observe  that,  given  a  set  of  relation  fragments,  higher  selectivities  increase  the 
cardinality  of  the  single  flat  relation  and  the  cardinalities  of  nested  subtuples  of  a 
single  nested  relation,  thus  increasing  the  amount  of  data  (in  bytes).  Figure  4.20a 
shows  an  exzimple  of  relation  fragments  with  high  EJA  ratios.  In  the  example,  all 
attributes  except  A  and  E  are  extra  join  attributes.  In  that  case,  the  corresponding 
single  flat  relation  and  single  nested  relation  contain  only  the  two  attributes  A  and 
E.  On  the  other  hand,  the  relation  fragments  shown  in  Figure  4.20b  have  no  extra 
join  attributes.  In  this  case,  all  six  attributes  appear  in  the  corresponding  single  flat 
relation  and  single  nested  relation.  From  these  examples,  we  observe  that,  given  a 
set  of  relation  fragments,  higher  EJA  ratios  decrease  the  tuple  sizes  of  the  single  flat 
relation  and  single  nested  relation,  thus  decreasing  the  amount  of  data  (in  bytes). 
Certainly  the  costs  depend  upon  the  amount  of  data  to  be  handled  to  retrieve  the 
same  single  nested  relation.  Therefore,  higher  selectivities  and  lower  EJA  ratios  are 
more  advantageous  to  the  RF  method  than  the  SFR  or  SFR  methods  in  terms  of 
cost. 

4.5.2  Overall  Comparison  using  Simulation 

We  computed  the  average  costs  of  the  SFR,  RF,  and  SNR  methods,  and  tallied 
the  winning  counts  —  the  number  of  times  each  method  incurred  the  minimum  cost 
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among  the  three  methods.  We  used  a  query  whose  join  tree  is  a  complete  binary  tree 
of  7  relations  as  shown  in  Figure  4.21.  The  domains  of  the  random  values  of  input 
data  parameters  are  as  follows.  (Let  $  denote  {<  1,2  >,  <  1, 3  >,  <  2, 4  >,  <  2, 5  > 
,<  3,6  >,<  3,7  >}.) 

•  1.00  <  aij  <  10.00  for  <i,j  >G 

•  0.00  <  pf,  <  1.00  for  i  =  1, 2,  •  •  • ,  7. 

•  10  <  Nj^  <  500, 10  <  Nfj  <  Nf.aij  for  j  =  2, 3,  •  •  • ,  7.  (See  Equation  4.10.) 

•  10  <  Tf.  <  500  for  i  =  2,3,  •  •  • ,  7. 

•  O.SOoij  <  ^ij  <  l.OOoij  for  j  —  2,3.  (See  Equation  4.18.), 

O.SOajj'  <  Bij  <  l.SOojj  for  <  i,j  >G  and  i  1. 

•  0.30  <  df.  <dt<  1.00  for  i  =  1,2,  •  •  • ,  7. 

The  values  of  the  other  data  parameters  are  obtained  from  these  values  using  the 
relationships  between  data  parameters  discussed  in  Section  4. 4.1. 2. 2.  As  a  simphfi- 
cation,  we  assumed  no  merging  of  relation  fragments  in  the  nesting  step.  The  effect 
of  ignoring  the  merging  cost  on  the  cost  comparison  result  is  negHgible.  Accordingly, 
we  used  71  =  1,7^  =  for  <  i,j  >€  ^  and  j  1,  and  =  1  for  i  =  1,2, •••,7 
(See  Equation  4.38). 

Table  4.5  shows  the  average  values  and  the  winning  counts  (in  percentage)  ob¬ 
tained  from  5000  random  test  cases  for  the  transmission  cost  and  the  partial  local 
processing  cost,  respectively. 

It  was  mentioned  in  Section  4.3.5  that  the  SNR  transmission  cost  is  always  less 
than  the  SFR  transmission  cost,  but  the  RF  transmission  cost  has  a  trade-off  with 
the  SFR  transmission  cost.  Our  resiilt  showed  that  indeed  the  SNR  transmission 
always  costed  less  than  the  SFR  transmission.  Moreover,  it  was  observed  that  the 
RF  transmission  costed  less  than  the  SFR  transmission  for  all  test  cases,  even  though 
there  is  a  theoretical  trade-off  with  the  SFR  method.  The  average  value  of  the  SFR 
transmission  cost  was  a,bout  1500  times  higher  than  that  of  the  RF  transmission 
cost  and  about  1100  times  higher  than  that  of  the  SNR  transmission  cost.  The 
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Method 

Transmission 

Partial  local  processing 

Average 
data  size 

Average  cost 

wins 

Average  cost 

#wins 

LAN 

WAN 

SFR 

3413  Mbytes 

3.2  hours 

2.4  days 

0% 

.2.9  hours 

0% 

RF 

2.4  Mbytes 

8.1  secs 

2.4  mins 

67% 

15.2  secs 

100% 

SNR 

3.2  Mbytes 

11.1  secs 

3.2  mins 

33% 

17.5  secs 

0% 

(Transmission  time  is  elapsed  time  and  local  processing  time  is  CPU  time.) 
Table  4.5:  Costs  evaluated  using  Random  Data  Parameters 


treinsmission  costs  for  the  LAN  and  WAN  showed  the  same  relative  costs  among  the 
different  methods  except  that  the  WAN  incurred  about  18  times  higher  cost  than 
LAN. 

Since  we  assumed  in  our  cost  model  that  the  server  speed  and  the  cUent  speed  are 
the  s^Lme,  the  SNR  method  always  takes  the  same  cost  as  the  RF  method  and  incurs 
the  additional  cost  (Equation  4.44)  of  assembling  a  single  nested  relation  on  a  client. 
Therefore,  the  RF  local  processing  cost  is  always  less  than  the  SNR  local  processing 
cost.  Furthermore,  our  result  showed  that  the  RF  local  processing  incurred  less  cost 
than  the  SFR  local  processing  for  all  test  cases. 

For  the  SFR,  RF,  and  SNR  method,  the  partial  local  processing  cost  is  0.9,  1.9,  1.6 
times  the  LAN  transmission  cost  while  it  is  0.05,  0.1,  0.1  times  the  WAN  transmission 
cost.  If  we  consider  the  uncounted  cost  of  query  processing  and  reference  resolution, 
the  local  processing  cost  will  be  the  major  cost  in  the  LAN  environment  and  hardly 
ignorable  even  in  the  WAN  environment. 

It  is  interesting  to  see  that  the  SFR  transmission  cost  was  evaluated  to  be  about 
1400  and  1100  times  higher  than  the  RF  tr2msmission  cost  and  the  SNR  transmission 
cost,  respectively,  while  the  SFR  partial  local  processing  cost  was  evaluated  to  be  only 
590  and  600  times  higher  than  the  RF  and  SNR  local  processing  cost.  This  difference 
in  the  ratios  is  due  to  the  use  of  the  binary  search  tree  to  represent  nested  subrelations. 
As  mentioned  in  Section  4. 3. 3. 2,  a  binary  search  tree  incurs  0(log2  N)  time  where 
N  is  the  number  of  nodes  in  the  tree.  On  the  other  hand,  the  transmission  cost  for 
transmitting  those  N  tuples  is  linear  with  respect  to  N,  i.e.,  0{N).  This  observation 
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demonstrates  that  the  benefits  of  the  RF  method  and  the  SNR  method  become  more 
manifest  in  terms  of  reducing  the  transmission  cost  than  the  local  processing  cost. 


4.5.3  Dependency  on  Selectivity  and  Extra  Join  Attribute 
Ratio 

4.5.3. 1  Observation  using  Sample  Case  Test 

We  performed  cost  comparisons  using  sample  values  of  data  parameters  and  observed 
the  dependency  of  the  costs  on  the  values  of  a  single  aij  and  the  set  of  pf.,i  = 
1,2,  •  •  •  ,5.  Figure  4.22  shows  the  join  tree  and  the  nesting  format  tree  of  a  sample 
query.  The  sample  values  of  the  input  data  parameters  are  as  follows. 

•  -N/,.  =  500,800,300,1200,300  for  i  =  1,2, 3, 4, 5,  respectively. 

•  ai2  =  3.0,  ai3  =  1.0  ~  10.0,  034  =  4.0,  azs  =  1-0 


•  /9i2  =  2.7, /3i3  =  0.9ai3,/334  =  3.8 


•  —  200,300,250,100,400  for  i  =  1,2, 3,4, 5,  respectively. 


•  Ph  = 


0.05,0.1,0.15,0.05,0.05  or 
0.8, 0.9, 0.7, 0.6, 0.9 


for  i  =  1,2, 3, 4, 5,  respectively. 


The  other  data  parameters  are  computed  from  those  input  parameters  using  the 
relationships  between  data  parameters  discussed  in  Section  4.4. 1.2.2.  We  evaluated 
the  costs  using  those  parameter  values  while  varying  the  value  of  Ois  from  1  through 
10.  The  same  evaluation  has  been  repeated  for  the  two  different  sets  of  p/j’s. 

Table  4.6  shows  the  result  of  the  cost  evaluation,  and  Figure  4.23  shows  the  graphs 
of  the  costs  of  different  methods  with  respect  to  the  values  of  ais  for  the  two  different 
sets  of  p//s. 

0^13-  Increasing  the  value  of  013  without  changing  the  value  of  is  equivalent 
to  increasing  the  value  of  Nj^.  In  the  RF  method,  the  increase  of  Nf^  increases  the 
size  of  F3  only  and  has  no  effect  on  the  sizes  of  the  other  relation  fragments.  On  the 
other  hand,  its  effect  on  increasing  the  number  of  duplicate  subtuples  in  a  single  flat 
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Transmission  cost  (unit:  seconds) 

LAN 

WAN 

Low  pf. 

High  Pf, 

Low  pf, 

High  Pf, 

ai3 

SFR 

RF 

SNR 

SFR 

RF 

SNR 

SFR 

RF 

SNR 

SFR 

RF 

SNR 

1.0 

18.0 

2.2 

3.0 

3.5 

2.2 

0.6 

317.9 

39.4 

53.4 

62.4 

39.4 

10.9 

2.0 

36.0 

2.5 

BO 

2.5 

1.0 

635.8 

43.9 

79.1 

124.7 

43.9 

18.1 

3.0 

54.0 

2.7 

5.9 

10.6 

2.7 

1.4 

953.7 

48.4 

104.9 

187.0 

48.4 

25.3 

4.0 

72.1 

3.0 

7.4 

14.1 

3.0 

1.8 

1271.6 

52.9 

130.6 

249.4 

52.9 

32.5 

8.9 

17.7 

3.2 

2.3 

1589.5 

57.4 

156.3 

311.7 

57.4 

39.7 

6.0 

108.1 

3.5 

10.3 

21.2 

3.5 

2.7 

1907.3 

61.8 

182.1 

374.0 

61.9 

46.9 

11.8 

24.7 

3.8 

3.1 

2225.2 

66.4 

207.8 

436.4 

66.4 

54.1 

3.5 

2543.1 

70.9 

233.6 

498.7 

70.9 

61.4 

9.0 

162.1 

4.3 

14.7 

31.8 

4.3 

3.9 

2861.0 

75.4 

259.3 

561.0 

75.4 

68.6 

10.0 

180.1 

4.5 

4.3 

3178.9 

79.9 

285.1 

623.3 

79.9 

75.8 

Partial  local  processing  cost  (unit:  seconds) 

Low  Pf, 

High  Pf, 

SNR 

SFR 

RF 

SNR 

1.0 

13.3 

2.6 

3.0 

2.9 

1.6 

OS 

2.0 

26.7 

3.5 

mm 

2.5 

3.0 

40.3 

4.4 

5.2 

9.2 

2.6 

Ou 

4.0 

54.3 

5.6 

6.6 

12.8 

3.3 

5.0 

68.1 

6.7 

HB 

6.0 

82.0 

7.8 

mm 

7.0 

96.0 

8.9 

8.0 

110.4 

10.2 

mm 

9.0 

124.5 

11.4 

13.6 

31.2 

6.9 

10.0 

138.6 

03 

Table  4.6:  Costs  evaluated  using  the  sample  values  of  data  parameters 
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F, 


O^Z5 


51  =  UFj 

52  =  IIi^2 

Ss  =  n(i^3  ^  F^) 

S4  =  ni^4 


(a)  Join  tree  (b)  Nesting  format  tree  (c)  Assembly  plan 


Figure  4.22:  A  sample  query  for  observing  dependency  on  Q13  and  pf^ 


relation  or  multiply  occurring  subtuples  in  nested  subrelations  is  more  significant. 
We  can  verify  this  fact  from  Equation  4.11,  Equation  4.12,  and  Equation  4.15.  That 
is,  the  increase  of  Nj^  not  only  causes  the  increase  of  /S13  by  Equation  4.11,  but  also 
increases  N^^  according  to  Equation  4.12.  Similarly,  the  increase  of  /3f^  is  ‘amplified’ 
by  a  factor  of  N,^^i2Bza{=-  5130)  if  we  compute  Nt  according  to  Equation  4.15.  The 
cost  evaluation  result  showed  that  both  the  transmission  cost  and  the  partial  local 
processing  cost  increased  linearly  with  respect  to  the  value  of  0:13,  and  the  slope  was 
in  the  order  of  the  SFR,  SNR,  and  RF  methods,  from  the  highest  first. 

P/,  :  As  for  the  values  of  a  higher  value  of  py.  increases  the  overhead  due  to 
extra  join  attributes  in  the  RF  method  while  making  the  SFR  method  and  the  SNR 
method  more  efficient  by  reducing  the  tuple  size  of  a  single  flat  relation  and  nested 
subrelations,  respectively,  as  we  can  see  from  Equation  4.14  and  Equation  4.16.  The 
cost  evaluation  result  showed  that  costs  were  less  for  the  higher  values  of  p/,  ’s  for 
both  the  transmission  cost  and  the  local  processing  cost.  One  exception  is  the  RF 
transmission  cost,  in  which  case  the  transmission  cost  is  independent  of  the  values 
of  py/s,  as  we  can  verify  from  Equation  4.47.  In  particular,  the  SNR  transmission 
incurred  less  cost  than  the  RF  transmission  for  the  higher  values  of  py.  ’s. 
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Cost  (seconds)  Cost  (seconds) 


(a)  LAN  Transmission  cost  (b)  Partial  local  processing  cost 

Cost  for  lower  values  of  extra  join  attribute  ratios. 

Cost  for  hi^er  values  of  extra  join  attribute  ratios. 


(The  abscissa  denotes  the  value  of  ais  and  the  ordinate  denotes  cost  in  seconds. 
Lines  labeled  with  boxes  or  circles  are  those  obtained  for  lower  or  higher  values  of 
pfi's,  respectively.) 

Figure  4.23:  Costs  evaluated  using  the  sample  values  of  data  parameters 
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Domain  FF:  selectivity  =  1 .00  ~  10.00,  EJA  ratio  =  0.00  -  1.00 
Domain  HL;  selectivity  =  5.00  ~  10.00,  EJA  ratio  =  0.00  ~  0.50 
Domain  LH;  selectivity  =  1 .00  ~  5 .00,  EJA  ratio  =  0.50  -  1 .00 
EJA  ratios 


Selectivities 

Figure  4.24:  Domain  HL  and  domain  LH  vs.  Domain  FF  (full  ranges) 

4. 5. 3. 2  Observation  using  Simulation 

We  carried  out  the  cost  evaluations  using  random  values  of  data  parameters  with  the 
same  domains  as  those  used  in  Section  4.5.2,  but  this  time  for  different  domains  of 
(Xij  s  and  s.  The  following  two  different  domains  were  used  for  generating  random 
values  of  a^’s  and  p//s. 

•  Domain  HL:  (Higher  values  of  a^-  and  lower  values  of 

5.00  <  aij  <  10.00  for  <  i,j  >e  9  and  0.00  <  p/,.  <  0.50  for  i  =  1, 2,  •  ■  • ,  7. 

•  Domain  LH:  (Lower  values  of  a,j  and  higher  values  of  p/,  .) 

1.00  <  aij  <  5.00  for  <  ij  >e  ^  and  0.50  <  pf^  <  1.00  for  i  =  1,  2,  •  •  • ,  7. 

Figure  4.24  contrasts  the  Domain  HL  and  Domain  LH  with  the  domain  of  the  full 
range  of  values  of  a^j  and  pj^  that  were  used  in  Section  4.5.2. 

Table  4.7  shows  the  result  of  cost  evaluations.  For  Domain  HL,  the  RF  method 
showed  more  favorable  result  than  the  result  shown  in  Tables  4.5  in  terms  of  both 
the  average  cost  and  the  winning  counts  than  the  SFR  or  SNR  method.  On  the 
other  hand,  for  domain  LH,  the  RF  method  showed  less  favorable  result.  Thus,  we 
confirmed  that  the  observations  made  in  Section  4.5.3. 1  are  generally  true  for  other 
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Transmission  Partial  local  processing 

Average  Average  cost  ^wins  Average  cost  #wins 

data  size  LAN  WAN _ 

33878  Mbytes  32.0  hours  23.5  days  0%  30.2  hours  0% 

4.1  Mbytes  14.0  secs  4.1  mins  93%  31.7  secs  100% 

8.8  Mbytes  29.9  secs  8.8  mins  7% _ 36.6  secs _ 0% 

(a)  Domain  HL  (5.00  <  <  10.00,0.00  <  pf,  <  0.50) 

Transmission  Partial  local  processing 

Method  Average  Average  cost  [  #wins '  Average  cost  #wins 

data  size  LAN  WAN _ 

SFI;  47.0  Mbytes  2.7  mins'  46.9  mins  0%  2.0  mins  0.8% 

RF  0.86  Mbytes  2.9  secs  51.8  secs  22%  4.0  secs  99.2% 

SNR  0.53  Mbytes  1.8  secs  31.8  secs  |  78%  |  4.6  secs  |  0% 

*  (b)  Domain  LH  (1.00  <  aij  <  5.00,0.50  <  pf^  <  1.00) 

(Transmission  time  is  elapsed  time  and  local  processing  time  is  CPU  time.) 
Table  4.7:  Costs  evaluated  using  random  data  parameter  values  with  biased  a^j’s  and 

P/.’s 
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values  of  data  parametes  as  well.  It  is  interesting  to  note  that,  for  Domain  Lf[,  there 
were  some  cases  in  which  the  SFR  method  won  over  the  RF  method  in  the  partial 
local  processing  cost. 


4.6  Summary  and  Future  Work 
4.6.1  Summary 

We  have  developed  the  mechanisms  of  three  different  methods  for  instantiating  ob¬ 
jects  £rom  a  remote  relational  database  server  by  materializing  a  view  query  and 
restructuring  the  query  result  into  a  nested  relation  and  resolving  references  among 
them.  The  three  different  methods  were  the  single  flat  relation  (SFR)  method,  the 
relation  fragment  (RF)  method,  and  the  single  nested  relation  (SNR)  method,  named 
after  the  data  structure  transmitted  from  a  server  to  a  client  in  each  method. 

Rigorous  algorithms  have  been  developed  for  each  step  of  the  object  instantiation 
process,  mainly  focusing  on  the  transmission  and  the  nesting  step  of  the  translation, 
and  a  partial  cost  model  has  been  developed.  We  have  excluded  the  query  processing 
cost  and  the  reference  resolution  cost  to  simplify  our  work,  because  these  two  costs  are 
the  same  in  all  three  methods.  We  have  performed  cost  comparisons  using  randomly 
generated  data  parameter  values;  and  using  sample  data  parameter  values  for  varying 
values  of  a  selectivity  (a,j)  and  for  higher  and  lower  values  of  extra  join  attribute  ratios 

iPj^ 

The  result  of  the  cost  comparison  demonstrated  that  the  RF  method  and  the 
SNR  method  are  more  efRcient  than  the  common  SFR  method  in  terms  of  .both  the 
transmission  cost  and  the  local  processing  cost.  Therefore,  the  RF  and  SNR  methods 
are  useful  not  only  for  remote  database  systems  but  also  for  local  database  systems. 
Besides,  the  RF  and  SNR  methods  are  useful  for  disk-storage  database  systems  as 
well  as  main  memory  database  systems  although  the  benefit  of  the  RF  and  SNR 
methods  is  relatively  less  for  the  disk-storage  database  systems  due  to  the  significant 
cost  of  disk  accesses. 

The  RF  method  wins  over  the  SNR  method  more  frequently.  Therefore,  the  RF 
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method  is  the  most  preferred  method  if  we  have  to  choose  one  of  the  three  methods. 
There  remains  an  optimization  issue  of  choosing  between  the  RF  method  and  the 
SNR  method  depending  on  the  query  and  the  speed  of  a  server  and  a  client.  (If  the 
server  runs  slower  than  the  client,  it  is  more  favorable  to  the  RF  method  than  the 
SNR  method  because  the  SNR  method  performs  the  nesting  step  on  a  server.  On  the 
other  hand,  if  the  server  runs  faster  than  the  chent,  it  is  more  favorable  to  the  SNR 
method.  Note  that  we  assumed  that  the  server  speed  and  the  client  speed  are  the 
same  for  the  cost  comparison.) 

We  have  not  considered  the  possibility  of  main  memory  overflow  in  case  the 
amount  of  data  retrieved  as  the  result  of  a  query  exceeds  the  amount  of  available 
main  memory  space.  Concern  about  main  memory  overflow  does  not  discourage  the 
use  of  the  RF  or  SNR  method  because  it  is  evident  that  the  SFR  method  will  suffer 
more  severely  Rom  the  shortage  of  main  memory  space  than  the  RF  or  SNR  method 
because  the  SFR  method  carries  more  redundant  data. 

4.6.2  Future  Work 

We  discuss  further  work  in  two  directions.  First,  the  improvement  of  the  efficiency  of 
the  RF  method  and  the  SNR  method,  and  secondly,  handling  left  outer  joins  in  each 
of  the  three  methods.  Remember  that  we  have  dealt  with  only  inner  joins  for  a  query 
in  this  portion  of  our  work. 

4.6.2. 1  Improving  the  Efficiency  of  the  RF  and  SNR  Methods 

As  mentioned  in  Section  4.3,  we  placed  more  effort  in  making  the  SFR  method  efficient 
than  the  RF  or  SNR  method  because  our  objective  was  to  demonstrate  that  the  RF 
method  and  the  SNR  method  are  more  efficient  than  the  SFR  method.  The  RF 
method  and  SNR  method  were  designed  to  be  rather  simple  than  utmost  efficient. 
We  present  here  some  ideas  that  are  worth  pursuing  to  improve  the  efficiency  of  the 
RF  method  and  the  SNR  method. 

In  the  current  RF  method,  a  cHent  carries  out  the  index  creation  and  navigational 
joins  on  relation  fragments  and  hence  a  server  must  send  extra  join  attributes  to 
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make  those  operations  possible.  As  discussed  in  Section  4.3.5  and  demonstrated  in 
Section  4.5.3. 1,  extra  join  attributes  are  the  source  of  redundant  data  in  the  RF 
method.  One  idea  for  avoiding  the  transmission  of  the  extra  join  attributes  is  to  have 
a  server  send  the  necessary  linkage  information  in  the  form  of  physical  pointers  to 
the  linked  tuples  of  the  other  relations.  In  order  to  map  between  the  heterogeneous 
address  spaces  of  a  server  and  a  client,  offset  addresses  can  be  used  as  long  as  it  can 
be  ensured  that  each  relation  fragment  is  allocated  in  a  contiguous  memory  space. 
Sending  physical  pointers  will  reduce  the  transmission  cost  by  not  sending  extra  join 
attributes  and  the  redundant  tuples  introduced  by  the  extra  join  attributes.  Moreover, 
all  a  client  has  to  do  is  to  follow  the  pointers  to  build  a  single  nested  relation  out 
of  the  relation  fragments.  Thus,  it  reduces  the  load  on  a  client.  However,  a  server 
has  to  pay  the  price  of  index  creation  and  navigational  join  to  produce  the  physical 
pointers.  A  direct  consequence  of  this  requirement  is  that  the  duplicate  elimination 
step  cannot  be  pipebned  with  the  transmission  of  tuples  and  more  load  is  placed  on  a 
server.  Besides,  the  transmission  protocol  becomes  more  complicated  because,  unlike 
the  case  of  sending  extra  join  attributes,  the  number  of  physical  pointers  attached  to 
each  tuple  varies  depending  on  the  number  of  matching  tuples. 

As  for  the  SNR  method,  the  current  SNR  method  has  the  overhead  of  dealing 
with  multiply  occurring  subtuples  in  nested  subrelations.  One  idea  of  eliminating 
these  multiply  occurring  subtuples  is  to  achieve  more  compaction  of  the  transmitted 
data  by  using  backward  pointers  embedded  in  the  formatted  stream  of  nested  tuples. 
These  backward  pointers  replace  the  actual  tuples  with  pointers  to  the  previously 
sent  identical  tuples.  It  will  make  the  transmission  protocol  and  the  assembly  process 
(Algorithm  4.3.16)  more  comphcated,  but  will  reduce  the  transmission  cost. 

The  two  ideas  described  so  far  have  their  major  benefit  in  reducing  the  transmis¬ 
sion  cost.  Therefore,  these  ideas  are  more  useful  in  the  wide  area  network  (WAN) 
environment  where  the  transmission  cost  is  the  dominant  cost. 

Finally,  we  have  used  the  RF  materialization  as  an  intermediate  step  of  the  SNR 
materialization  because,  as  mentioned  in  Section  4.3.1,  a  direct  materiahzation  dis¬ 
ables  the  join  reordering  by  a  query  optimizer.  It  will  be  worthwhile  to  compare 
the  cost  reduction  achievable  by  using  the  direct  materialization  of  a  single  nested 
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relation  and  the  cost  reduction  achievable  by  utilizing  the  join  reordering  aveiilable 
from  a  query  optimizer. 

4.6.2. 2  Handling  Left  Outer  Joins 

We  have  simplified  our  work  by  not  considering  left  outer  joins  in  a  query  although 
left  outer  joins  are  required  frequently  to  prevent  information  loss.  Thus,  it  will  make 
our  work  more  complete  if  we  discuss  the  handling  of  left  outer  joins  in  each  step 
of  the  different  object  instantiation  methods,  before  ending  this  chapter.  Since  we 
designed  the  SNR  method  using  the  same  query  materialization  and  nesting  processes 
as  the  RF  method,  we  discuss  only  the  SFR  method  and  the  RF  method. 

The  consideration  of  left  outer  joins  requires  the  handling  of  non-matching  tuples 
in  the  join  evaluations  of  the  query  materialization  step,  and  the  processing  of  null 
tuples  in  subsequent  steps.  We  state  briefly  the  key  points  of  handling  left  outer  joins 
at  each  step. 

In  the  SFR  method,  the  query  processing  algorithm  described  in  Algorithm  4.3.1 
should  be  modified  so  that  if  a  join  is  a  left  outer  join  and  there  exists  no  matching 
tuple  in  the  destination  relation,  null  tuples  are  inserted  in  place  of  the  tuples  of  the 
destination  relation  and  its  child  relations  in  the  join  tree. 

For  each  Ri 

if  ti  satisfies  then  continue 
else 

Set  ii  and  all  ti  €  Rj^s  to  null  where  Rj,j  ^  i,  are  the  relations  in  the 
subtree  of  the  join  tree  rooted  by  Ri]  Continue, 
where  ‘continue’  means  to  continue  the  nested  loop  join  on  the  rest  of  the  relations 
that  have  not  yet  been  processed.  On  the  other  hand,  in  the  RF  method,  the  query 
processing  algorithm  described  in  Algorithm  4.3.2  should  be  modified  so  that  if  a  join 
is  a  left  outer  join  and  there  exists  no  matching  tuple  in  the  destination  relation, 
the  joins  in  a  subtree  of  the  join  tree  rooted  by  the  destination  relation  are  skipped. 
Consequently,  no  null  subtuple  is  inserted  to  any  relation  fragment. 


For  each  t,  €  Ri 
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if  ti  satisfies  then  continue 
else 

Skip  all  Rj’s  where  Rj,j  /  i,  are  the  relations  in  the  subtree  of  the  join 
tree  rooted  by  R{-,  Continue. 

Duplicate  elimination  process  is  the  same  as  Algorithm  4.3.3. 

A  single  nested  relation  which  is  produced  by  the  nesting  step  does  not  contain 
null  subtuples  at  all.  Therefore,  the  SFR  nesting  step  as  described  in  Algorithm  4.3.5 
should  be  modified  so  that  any  decomposed  subtuple  all  of  whose  column  values  are 
nulls  is  discarded.  To  achieve  this  modification,  we  should  place 

‘If  tj  =  A  (a  null  tuple)  then  return.’ 

in  front  of 

‘wr  :=  the  root  pointed  by  Wi.Ui 
which  is  the  first  line  of  Algorithm  4.3.6. 

The  RF  nesting  needs  some  modifications  as  well.  First  of  all,  the  join  purge  of 
Section  4. 3. 3. 5.1  is  not  applicable  to  a  left  outer  join.  Theorem  4.3.1  does  not  hold 
for  dangling  tuples  in  the  source  relation  of  a  left  outer  join.  For  example,  given  a  left 
outer  join  Fi  [X  F2  from  a  relation  fragment  F\  to  another  relation  fraement 
F’2  with  conjunctive  join  predicates  pi  A  p2  A  •  •  •  A  p*.,  it.  is  possible  that  some  of  the 
dangling  tuples  in  F-y  appear  to  have  matching  tuples  in  F2  if  only  one  of  pi,p2  •  *  ■ 
is  evaluated,  while  in  fact  there  exists  no  matching  tuple  for  a  conjunction  of  all 
join  predicates,  Pi  A  p2  A  •  •  •  A  p^.  The  assembly  planning  step  (Section  4. 3. 3. 5. 2) 
and  the  index  creation  (Section  4. 3. 3. 5. 3)  step  need  no  modification  because  they 
have  nothing  to  do  with  join  evaluations.  On  the  other  hand,  the  navigational  join 
step  (Section  4.3.3. 5.4)  performs  join  evaluations  and  thus  should  be  modified  to 
distinguish  between  inner  joins  and  left  outer  joins.  Algorithm  4.3.14  (Match)  always 
returns  one  or  more  matching  tuples  if  the  evaluated  join  is  an  inner  join  but  may 
return  no  matching  tuple  if  the  evaluated  join  is  a  left  outer  join.  Accordingly, 
Algorithm  4.3.13  should  be  modified  so  that  if  Match(ti_i,  Fi,  $i)  returns  no  matching 
tuple  and  the  evaluated  join  is  a  left  outer  join  then  skip  the  rest  of  the  ‘for  each’ 
statements  and  set  ■  ,tk  to  nulls  before  executing  Step  3a  through  Step  3d. 
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If  we  had  considered  the  effect  of  nulls  generated  by  left  outer  joins  in  our  work, 
the  result  of  cost  comparison  would  have  appeared  to  be  even  more  favorable  to  the 
RF  method  and  the  SNR  method.  The  reason  is  that,  in  a  single  flat  relation,  nulls 
are  duplicated  in  the  same  way  the  other  tuples  (which  are  not  nulls)  are  duplicated. 
Note  that  there  is  no  duplicate  tuple  in  a  relation  fragment  or  a  single  nested  relation. 


Chapter  5 
Conclusion 


In  this  thesis,  we  addressed  two  problems  -  outer  join  and  instantiation  efficiency 
-  in  the  view-object  framewprk,  i.e.,  in  the  framework  of  instantiating  objects  from 
relational  databases  through  views.  First,  we  introduced  the  view-object  framework 
starting  from  a  general  framework  of  integrating  objects  and  databases.  Then,  given 
the  framework,  we  made  three  major  contributions  as  summarized  below. 

•  We  defined  a  rigorous  system  model  in  order  to  embody  the  concept  of  inter¬ 
facing  between  objects  and  relations.  The  system  model  consists  of  three  parts: 
an  object  type  model,  a  data  model,  and  a  view  model.  An  object  type  defines 
the  nested  structure  of  objects.  The  non-null  option  is  used  to  specify  object 
attributes  that  are  prohibited  from  being  nulls.  Data  model  uses  the  relational 
model  and  includes  integrity  constraints  as  part  of  the  model.  A  view  consists 
of  a  relational  select-project-join  query  and  an  attribute  mapping  function  for 
mapping  between  object  attributes  and  relation  attributes.  It  was  beyond  our 
scope  to  formulate  a  query  or  derive  an  attribute  mapping  function  for  a  given 
view,  so  that  we  assumed  that  a  query  and  an  attribute  mapping  function  were 
predefined  in  each  pertinent  object  type.  The  system  model  thus  developed 
provided  the  basis  for  developing  a  simple  solution  to  the  outer  join  problem 
and  a  part  of  the  system  model  was  used  for  the  instantiation  efficiency  problem 
as  well. 
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•  We  developed  a  mechanism  for  having  the  system  decide  which  join  should  be 
an  inner  join  and  which  join  should  be  a  left  outer  join,  given  a  view-query, 
and  generate  non-null  filters  on  the  relations  specified  in  the  view- query.  Users 
are  required  only  to  specify  non-nuU  constraints  on  object  attributes  whose 
values  should  not  be  null.  AH  joins  in  a  view-query  are  initialized  as  left  outer 
joins.  Those  non-null  constraints  on  object  attributes  are  mapped  to  non- null 
constraints  on  relation  attributes  of  the  query  result.  The  non-null  constraints 
on  relation  attributes  are  then  used  to  prescribe  non-null  filters  on  the  attribute 
of  base  relations  and  replace  left  outer  joins  sitting  on  the  join  path  from  a  pivot 
relation  to  the  non-null  constrained  relations  by  inner  joins.  The  remaining  left 
outer  joins  are  further  reduced  into  inner  joins  if  certain  integrity  constraints 
are  satisfied.  Besides,  unnecessary  non-nuH  filters  are  eliminated. 

•  We  developed  two  new  methods  of  instantiating  objects  from  remote  relational 
databases,  which  are  far  more  efficient  than  the  conventional  method  of  retriev¬ 
ing  a  single  fiat  relation  (SFR).  One  of  the  two  new  methods  retrieves  a  query 
result  as  a  set  of  relation  fragments  (RF’s).  The  other  method  retrieves  a  query 
result  as  a  single  nested  relation  (SNR).  We  called  the  two  new  methods  as 
the  RF  method  and  the  SNR  method  while  we  called  the  conventional  method 
as  the  SFR  method.  The  algorithms  of  the  three  object  instantiation  methods 
(SFR,  RF,  and  SNR)  were  described  rigorously.  Then,  we  derived  cost  formulas 
based  on  the  algorithms  and  compared  the  estimated  costs  of  the  three  methods. 
Two  techniques  were  used  for  cost  comparison:  sample  case  test  and  simulation. 
The  cost  comparison  result  showed  that  the  RF  method  and  the  SNR  method 
are  far  more  efficient  than  the  SFR  method  for  both  the  transmission  cost  and 
the  local  processing  cost. 
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Measurement  of  Cost  Parameters 


The  values  of  cost  parameters  were  measured  using  programs  that  are  sufficiently 
realistic  to  be  part  of  an  actual  implementation.  As  mentioned  in  Section  4.4.1. 1,  we 

use  CPU  time  for  main  memory  cost  and  an  elapsed  time  for  network  communication 
cost. 


A.l  Main  Memory  Cost  parameters 

We  used  Unix  clock  system  call  for  measuring  the  CPU  time  of  the  elementary  main 
memory  operations  shown  in  Table  4.1.  The  time  resolution  of  the  clock  is  1/60 
seconds  while  main  memory  operations  take  as  httle  as  a  few  microseconds.  The 
poor  resolution  of  clock  made  it  impossible  to  measure  the  precise  values  of  main 
memory  cost  parameters.  Moreover,  the  execution  time  varies  every  time  the  same 
code  is  run,  depending  on  the  system  load.  Thus,  we  obtained  the  values  shown  in 
Tables  4.1  by  repeating  the  same  code  one  million  times  and  computing  an  average 
value. 

The  cost  parameter  value  varies  depending  on  how  many  subprocedures  are  called 
during  execution.  We  can  actually  define  as  many  subprocedures  as  we  vrant.  Ac> 
cording  to  our  experiment  on  Sun-3,  the  invocation  of  a  subprocedure  which  requires 
four  arguments  took  about  5  to  6  //.secs,  which  is  a  large  amount  of  time  for  a  main 
memory  operation.  Thus,  we  excluded  the  effect  of  subprocedure  invocation  from 
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our  measurement  by  writing  a  dummy  subprocedure  requiring  the  same  set  of  input 
parameters  as  a  counterpart  for  each  subprocedure  and  subtracting  the  time  required 
to  invoke  dummy  subprocedures  from  the  total  time.  This  approach  means  that  our 
cost  parameter  values  are  the  minimum  values  considering  only  the  ‘plain’  code  ex¬ 
ecution  time.  One  exception  is  that  we  did  not  subtract  a  subprocedure  invocation 
time  if  we  judged  that  the  code  must  use  a  subprocedure,  intrinsically  independently 
of  who  writes  the  code. 

Now  we  comment  on  some  details  of  how  each  cost  parameter  was  obtained. 

•  Chs’  We  used  an  implementation  of  Algorithm  4.3.7  for  tuple  sizes  of  100  to  500 
bytes.  Tuples  were  initialized  with  pseudo-randomly  generated  base-64  ASCII 
strings.  The  values  of  Cbs  using  those  random  tuples  were  measured  to  be 
independent  of  the  tuple  size. 

•  Cent'  We  measured  the  time  for  comparing  two  tuples  of  size  100  to  500  bytes 
where  each  tuple  was  initialized  with  pseudo-random  base-64  ASCII  string,  and 
obtained  the  same  value  independently  of  the  tuple  size. 

•  Ccij  Ccb:  We  measured  the  time  for  copying  a  tuple  of  size  100  to  1000  bytes. 
The  measured  time  was  linear  with  respect  to  the  tuple  size. 

•  Cg:  The  time  for  evaluating  equijoins  on  attributes  of  type  integer  was  measured 
using  a  code  written  for  more  general  joins  including  non-equijoins  on  non¬ 
integer  attributes.  We  used  the  type  integer  because  it  frequently  happens 
that  joins  are  performed  on  key  attributes  and  the  key  attributes  are  integers. 
We  used  the  address  of  the  join  attributes,  and  their  sizes  and  types  as  input 
parameters  and  did  hot  count  the  time  for  obtaining  those  values  themselves. 

•  Cfi:  Folding  was  done  by  dividing  a  tuple  into  integer  segments  and  adding 
up  the  values  of  the  segments.  The  tuple  size  used  was  100  to  500  bytes.  The 
measured  time  was  proportional  to  the  tuple  size. 

•  Chc*  We  measured  the  time  for  hashing  computation  on  a  psedo-randomly  gen¬ 
erated  integer  hashing  key  using  two  different  hashing  methods:  the  division 
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method  and  the  multiplicative  hashing  method  [94].  The  value  shown  in  Ta¬ 
ble  4,1  is  for  the  multiplicative  hashing  method. 

•  Cma'.  Our  experiment  showed  that  Unix  memory  allocator  (maJloc)  takes  about 
130  fisec  on  Sun-3  without  regard  to  the  allocated  memory  size  while  the  other 
main  memory  operations  takes  only  a  couple  of  tens  of  miscroseconds.  There¬ 
fore,  if  we  used  maUoc  for  our  work,  the  memory  management  cost  would  be¬ 
come  dominant.  However,  it  is  a  common  practice  to  pre-allocate  a  working 
space  [9,  96]  to  facilitate  faster  memory  allocation  and  garbage  collection.  Then, 
memory  allocation  takes  only  the  cost  of  moving  a  stack  pointer  within  the  pre- 
allocated  working  space  as  long  as  the  working  space  need  not  be  expanded. 
We  assumed  the  usage  of  a  working  space  mechanism. 

•  Cmp'.  The  time  for  reading  or  writing  a  pointer  value  is  so  small  that  it  hardly 
affects  the  cost  computation  result.  Nevertheless  we  use  it  for  completeness. 

•  Cpi,  Cpb'.  We  measured  time  for  projecting  a  tuple  of  size  500  bytes  on  a  varying 
number  of  32  byte  columns.  The  measured  time  was  proportional  to  the  total 
size  of  projected  subtuple. 

•  Cai,  Can'.  We  measured  the  costs  of  reading  a  join  column  of  size  8  bytes  while 
scanning  a  relation,  and  computing  an  integer  hashing  key  from  the  read  column 
value.  The  size  of  a  column  (8  bytes)  are  reasonable  because  it  is  likely  that  join 
attributes  are  of  type  (short  or  long)  integer.  We  assumed  tuples  are  allocated 
contiguously  within  main  memory.  The  measured  time  was  linear  with  respect 
to  the  number  of  scanned  tuples. 


A. 2  Network  Communication  Cost  Parameters 

The  values  of  network  communication  cost  parameters  {Ci,Cb)  depend  on  the  commu¬ 
nication  media.  It  is  well  known  that  the  data  rate  is  10  Mbps  for  Ethernet  [65]  used 
in  the  LAN  environment.  Cheriton  and  Williamson  [69]  measured  the  communication 
latency  (C/)  and  the  per- byte  communication  cost  {Cb)  on  an  idle  10  Mbps  Ethernet 
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connecting  two  SUN-3/75’s,  and  obtained  Ci  =  2.23  msec  and  Cb  =  1.8/isec.  As  for 
the  WAN,  in  [66]  it  is  stated  that  the  data  rate  is  about  56  Kbps  for  the  highest  speed 
leased  phone  hne  in  normal  use  while  1.544  Mbps  T1  NSFnet  [67,  68]  hnes  are  used 
in  a  few  places  where  the  high  cost  (in  terms  of  financial  investment)  is  acceptable. 
However,  the  current  status  of  technological  development  has  come  to  the  point  that 
T1  hnes  are  in  practical  usage  for  the  NSFnet  and  the  availabihty  of  T3  hnes  (45 
Mbps)  is  promised  in  near  future. 

We  measured  the  elapsed  time  for  transmitting  data  from  a  SUN-3/60  on  the 
Stanford  University  Ethernet  LAN  to  another  SUN-3/60  on  the  same  LAN,  and  also 
to  a  SUN-4  on  the  University  of  Hhnois  via  the  Tl  NSFnet  WAN.  The  chent  part  of 
the  code  repeated  the  transmission  of  different  amounts  of  data  (0,  IK,  4K,  8K,  12K, 
16K  bytes  of  data  plus  10  bytes  of  header)  60000  times  and  averaged  the  measured 
round  trip  times.  The  server  part  of  the  code  was  written  to  send  an  acknowledgement 
so  that  the  chent  part  can  mea.sure  the  round  trip  time.  The  measured  time  is  from 
main  memory  to  main  memory.  It  does  not  include  any  disk  access  cost  but  does 
include  main  memory  execution  time  for  iterations,  buffer  pointer  movements,  and 
sending  an  acknowledgement. 

Figure  A.l  shows  the  average  elapsed  round  trip  times  measured  for  vaying  data 
sizes  on  the  LAN  and  the  WAN,  respectively.  The  measured  round  trip  times  were 
almost  hneax  with  respect  to  the  amount  of  transmitted  data.  We  computed  the  (ap¬ 
proximate)  values  of  the  two  communication  cost  parameters,  Ci  and  Cb,  by  equating 
the  measured  round  trip  times  to  2  *  Cj  4-  Cj,  x  Size  for  different  values  of  Size  =  10, 
1034,  4106,  8202,  12298,  and  16394  bytes.  (We  did  not  use  12298  and  16394  bytes 
for  the  WAN.) 


124 


APPENDIX  A.  MEASUREMENT  OF  COST  PARAMETERS 


Bibliography 

[1]  F.  Bancillion,  et  al.,  “The  Design  and  Implementation  of  O2,  an  Object- 
Oriented  Database  System,”  in  ‘Advances  in  Object-Oriented  Database  Sys¬ 
tems’,  Springer- Verlag,  September  1988. 

[2]  0.  Deux,  et  al.,  “The  Story  of  O2,”  IEEE  Transactions  on  Knowledge  and  Data 
Engineering,  vol.  2,  no.  1,  March  1990,  pp.  91-108. 

[3]  R.  Agrawal,  and  N.  Gehani,  “ODE  (Object  Database  and  Environment):  The 
Language  and  the  Data  Model,”  Proceedings  of  the  ACM  SIGMOD  International 
Conference  on  Management  of  Data,  Portland,  Oregon,  May-June  1989. 

[4]  D.  Maier,  and  J.  Stein,  “Development  of  an  Object-Oriented  DBMS,”  Proceed¬ 
ings  of  the  OOPSLA  International  Conference  on  Object-Oriented  Programming 
Systems,  Languages,  and  Apphcations,  September  1986,  pp.  472-482. 

[5]  S.  Ford,  et  al.,  “Zeitgeist:  Database  Support  for  Object-Oriented  Programming,” 
Proceedings  of  the  International  Workshop  on  Object-Oriented  Database  Sys¬ 
tems,  1988,  pp.  23-42. 

[6]  W.  Kim,  N.  Chou,  and  J.  Garza,  “Integrating  an  Object-Oriented  Programming 
System  with  a  Database  System,”  Proceedings  of  the  OOPSLA  International 
Conference  on  Object-Oriented  Programming  Systems,  Languages,  and  Appli¬ 
cations,  September  1988,  pp.  142-152. 

[7]  W.  Kim,  J.  Garza,  N.  Ballou,  and  D.  Woelk,  “Architecture  of  the  OIRON  Next- 
Generation  Database  System,”  IEEE  Transactions  on  Knowledge  and  Data  En¬ 
gineering,  vol.  2,  no.  1,  March  1990,  pp.  109-124. 


125 


126 


BIBLIOGRAPHY 


[8]  D.  Fishman,  et  al.,  “Iris:  An  Object-Oriented  Database  Management  System,” 
ACM  Transactions  on  Office  Information  Systems,  vol.  5,  no.  1,  January  1987, 
pp.  48-69. 

[9]  K.  Wilkinson,  P.  Lyngbaek,  and  W.  Hasan,  “The  Iris  Architecture  and  Imple- 
itientation,”  IEEE  Transactions  on  Knowledge  and  Data  Engineering,  vol.  2,  no. 
1,  March  1990,  pp.  63-75. 

[10]  M.  Stonebraker,  and  L.  Rowe,  “The  Design  of  POSTGRES,”  Proceedings  of  the 
ACM  SIGMOD  International  Conference  on  Management  of  Data,  1986,  pp. 
340-354. 

[11]  M.  Stonebraker,  L.  Rowe,  and  M.  Hirohama,  “The  Implementation  of  POST¬ 
GRES,”  IEEE  Transactions  on  Knowledge  and  Data  Engineering,  vol.  2,  no.  1, 
March  1990,  pp.  125-142. 

[12]  T.  Learmont,  and  R.  Cattell,  “An  Object-Oriented  Interface  to  a  Relational 
Database,”  Proceedings  of  the  International  Workshop  on  Object-Oriented 
Database  Systems,  1987. 

[13]  G.  Wiederhold,  “Database  Design  (2nd  ed.),”  McGraw-Hill  Book  Company, 
1983. 

[14]  G.  Wiederhold,  “Views,  Objects,  and  Databases,”  IEEE  Computer,  December 
1986,  pp.  37-44. 

[15]  T.  Barsalou,  W.  Sujansky,  and  G.  Wiederhold,  “Expert  database  systems  in 
medicine — The  PENGUIN  project”.  Proceedings  of  the  AAAI  Spring  Sympo¬ 
sium  on  AI  in  Medicine,  Stanford  University,  CA,  March,  1990,  pp.  14-18. 

[16]  T.  Barsalou,  and  G.  Wiederhold,  “Complex  objects  for  relational  databases”. 
Computer  Aided  Design  (Special  issue  on  object-oriented  techniques  for  CAD), 
vol.  22,  no.  8,  1990,  pp.  458-468. 


BIBLIOGRAPHY 


127 


[17]  T.  Barsalou,  “View  objects  for  relational  databases”,  Ph.D.  thesis,  Medical  In¬ 
formation  Sciences  Program,  Stanford  University,  1990.  (Also  published  as  a 
technical  report  No.  STAN-CS-90-1310,  Computer  Science  Department). 

[18]  B.  Cohen,  “Views  and  Objects  in  OBI:  A  Prolog-based  View-Object-Oriented 
Database,”  Technical  report  TR.PRRL-88-TR-005,  SRI,  March  1988. 

[19]  G.  Wiederhold,  T.  Barsalou,  and  S.  Chaudhuri,  “Managing  Objects  in  a  Rela¬ 
tional  Framework,”  Technical  report  CS-89-1245,  Stanford  University,  January 
1989. 

[20]  A.  Paepcke,  “PCLOS:  A  Flexible  Implementation  of  CLOS  Persistence,”  Pro¬ 
ceedings  of  the  European  Conference  on  Object-Oriented  Programming,  Oslo, 
Norway,  August  1988. 

[21]  Personal  communication  with  Andreas  Paepcke,  Hewlett-Packard  Labs.,  Palo 
Alto,  California,  November  1990. 

[22]  K.  Law,  G.  Wiederhold,  T.  Barsalou,  N.  Siambela,  W.  Sujansky,  D.  Zingmond, 
and  H.  Singh,  “An  Architecture  for  Managing  Design  Objects  in  a  Sharable 
Relational  Framework,”  International  Journal  of  Systems  Automation,  Research 
and  Apphcations,  International  Society  for  Productivity  Enhancement. 

[23]  K.  Law,  G.  Wiederhold,  T.  Barsalou,  N.  Siambela,  W.  Sujansky,  and  D.  Zing¬ 
mond,  “Managing  Design  Objects  in  a  Sharable  Relational  Framework,”  Pro^ 
ceedings  of  the  ASME  International  Conference  on  Computers  in  Engineering, 
Boston,  MA,  1990. 

[24]  K.  Law,  T.  Barsalou,  and  G.  Wiederhold,  “Management  of  Complex  Structural 
Engineering  Objects  in  a  Relational  Framework,”  Engineering  with  Computers, 
6:81-92,  1990. 

[25]  W.  Rubenstein,  M.  Kubicar,  and  R.  Cattell,  “Benchmarking  Simple  Database 
Operation,”  Proceedings  of  the  ACM  SIGMOD  International  Conference  on 
Management  of  Data,  May  1987. 


128 


BIBLIOGRAPHY 


[26]  E.  Codd,  “A  Relational  Model  of  Data  for  Large  Shared  Data  Banks,”  Commu¬ 
nications  of  the  ACM,  vol.  13,  no.  6,  June  1970. 

[27]  D.  Tsichritzis,  and  A.  Klug  (eds.),  “The  ANSI/X3/SPARC  DBMS  Framework: 
Report  of  the  Study  Group  on  Data  Base  Management  Systems,”  Information 
Systems  3,  1978. 

[28]  D.  Jardine  (ed.),  “The  ANSI/SPARC  DBMS  Model,”  Proceedings  of  the  Second 
SHARE  Working  Conference  on  Data  Base  Management  Systems,  Montreal, 
Canada,  April  26-30,  1976. 

[29]  R.  Haskin,  and  R.  Lorie,  “On  Extending  the  Functions  of  a  Relational  Database 
System,  Proceedings  of  the  ACM  SIGMOD  International  Conference  on  Man¬ 
agement  of  Data,  June  1982,  pp.  207-212. 

[30]  R.  Lorie,  and  W.  Plouffe,  “Complex  Objects  and  Their  Use  in  Design  Transac¬ 
tions,”  Proceedings  of  the  IEEE  Annual  Meeting- Database  Week:  Engineering 
Design  AppHcations,  May  1983,  pp.  115-121. 

[31]  K.  Dittrich,  and  R.  Lorie,  “Object-oriented  Database  Concepts  for  Engineering 
Applications,  Technical  report  RJ  4691  (50029),  IBM  Research  Laboratory,  San 
Jose,  CA  95193,  May  1985. 

[32]  J.  UUman,  “Principles  of  Database  and  Knowledge-Base  Systems,”  Computer 
Science  Press,  1988. 

[33]  J.  UUman,  “Database  Theory-Past  and  Future,”  Proceedings  of  the  ACM 
SIGACT-SIGMOD  Symposium  on  Principles  of  Database  S5'^stems,  San  Diego, 
March  1987. 

[34]  K.  Morris,  J.  UUman,  and  A.  Van  Gelder,  “Design  Overview  of  the  NAIL!  Sys¬ 
tem,  Proceedings  of  the  International  Logic  Programming  Conference,  1986. 

[35]  S.  Tsur,  and  C.  Zaniolo,  “LDL:  A  Logic-based  Data- Language,”  Proceedings  of 
the  12th  International  Conference  on  Very  Large  Data  Bases,  Kyoto,  August 
1986. 


BIBLIOGRAPHY 


129 


[36]  D.  Chimenti,  A.  O’Hare,  R.  Krishnamurthy,  and  C.  Zaniolo,  “An  Overview  of 
the  LDL  System,”  IEEE  Data  Engineering,  vol.  10,  no.  4,  December  1987. 

[37]  C.  Date,  “The  Outer  Join,”  Proceedings  of  the  2nd  International  Conference  on 
Databases,  Cambridge,  Britain,  September  1983. 

[38]  E.  Codd,  “Extending  the  Relational  Database  Model  to  Capture  More  Meaning,” 
ACM  Transactions  on  Database  Systems,  vol.  4,  no.  4,  December  1979. 

[39]  C.  Date,  “Referential  Integrity,”  Proceedings  of  the  7th  International  Conference 
on  Very  Large  Data  Bases,  Cannes,  France,  September  1981,  pp.  2-12. 

[40]  C.  Date,  “An  Introduction  to  Database  Systems,”  vol.  1,  Fourth  edition, 
Addison- Wesley  Publishing  Company,  Inc.,  1986. 

[41]  F.  Bancilhon,  “Object-Oriented  Database  Systems,”  Invited  lecture.  Proceed¬ 
ings  of  the  7th  ACM  SIGART-SIGMOD-SIGACT  Symposium  on  Principles  of 
Database  Systems.,  Austin,  Texas,  March  1988. 

[42]  D.  Maier,  “Why  Isn’t  There  an  Object-Oriented  Data  Model?,”  Proceedings  of 
the  IFIP  11th  World  Computer  Congress,  San  Francisco,  California,  September 
1989. 

[43]  J.  Joseph,  S.  Thatte,  C.  Thompson,  and  D.  Wells,  “Report  on  the  Object- 
Oriented  Database  Workshop,”  SIGMOD  Record,  vol.  18,  no.  3,  September  1989. 

[44]  W.  Wilkes,  P.  Klahold,  and  G.  Schlageter,  “Complex  and  Composite  Objects  in 
CAD/CAM  Databases,”  Proceedings  of  the  5th  IEEE  International  Conference 
on  Data  Engineering,  Los  Angeles,  February  1989. 

[45]  P.  Buneman,  S.  Davidson,  and  A.  Watters,  “A  Semantics  for  Complex  Objects 
and  Approximate  Queries,”  Proceedings  of  the  ACM  Symposium  on  Principles 
of  Database  Systems,  1988. 


130 


BIBLIOGRAPHY 


[46]  W.  Kim,  J.  Banerjee,  and  H.  Chou,  “Composite  Object  Support  in  an  Object- 
Oriented  Database  System,”  Proceedings  of  the  OOPSLA  International  Confer¬ 
ence  on  Object-Oriented  Programming  Systems,  Languages,  and  Applications, 
October  1987,  pp.  118-125. 

[47]  “IntelliCorp  KEE™  Software  Development  System  User’s  Manual,”  Document 
no.  3.0-U-l,  InteUicorp,  July  1986. 

[48]  R.  Kempf,  and  M.  Stelzner,  “Teaching  Object-Oriented  Programming  with  the 
KEE  System,”  Proceedings  of  the  OOPSLA  International  Conference  on  Object- 
Oriented  Programming  Systems,  Languages,  and  Applications,  October  1987, 
pp.  11  -  25. 


[49]  S.  Khoshafian,  and.  G.  Copeland,  “Object  Identity,”  Proceedings  of  the  OOPSLA 
International  Conference  on  Object-Oriented  Programming  Systems,  Languages, 
and  Applications,  1986. 

[50]  S.  Abiteboul,  and  P.  KaneUakis,  “Object  Identity  as  a  Query  Language  Primi¬ 
tive,”  Proceedings  of  the  ACM  SIGMOD  International  Conference  on  Manage¬ 
ment  of  Data,  Portland,  Oregon,  May-June  1989. 

[51]  G.  Wiederhold,  “Binding  in  Information  Processing,”  Technical  report  no. 
STAN-CS-81-851,  Department  of  Computer  Science,  Stanford  University,  Stan¬ 
ford,  CA  94305,  May  1981. 

[52]  T.  Pratt,  “Programming  Languages:  Design  and  Implementation  (2nd  ed.),” 
Prentice-HaU,  Inc.,  1984. 

[53]  T.  0  Hare,  and  A.  Sheth,  “The  Interpreted— Compiled  Range  of  AI/DB  Systems,” 
Technical  Memo  (unpublished),  Paoli  Research  Center,  Unisys  Corp.,  July  1988. 

[54]  S.  Ceri,  G.  Gottlob,  and  G.  Wiederhold,  ’’Interfacing  Relational  Databases  and 
Prolog  Efficiently”,  IEEE  Transactions  on  Software  Engineering,  February  1989. 


BIBLIOGRAPHY 


131 


[55]  S.  Finkelstein,  “Common  Expression  Analj^sis  in  Database  AppHcations,”  Pro¬ 
ceedings  of  the  of  ACM  SIGMOD  International  Conference  on  Management  of 
Data,  1982. 

[56]  M.  Jarke,  “Common  Subexpression  Isolation  in  Multiple  Query  Optimization,” 
in  W.  Kim,  D.  S.  Reiner,  and  D.  S.  Batory  (eds),  ‘Query  Processing  in  Database 
Systems,’  Springer,  1984,  pp.  191-205. 

[57]  P.  Larson,  and  H.  Yang,  “Computing  Queries  from  Derived  Relations,”  Proceed¬ 
ings  of  the  of  the  11th  VLDB  International  Conference  on  Very  Large  Databases, 
August  1985. 

[58]  T.  Selhs,  “Multiple- Query  Optimization,”  ACM  Transactions  on  Database  Sys¬ 
tems,  vol.  13,  no.  1,  March  1988,  pp.  23-52. 

[59]  A.  Sheth,  D.  Buer,  S.  Russel,  and  S.  Dao,  “Cache  Management  System:  Pre- 
Hminary  Design  and  Evaluation  Criteria,”  Technical  Memo  TM- 8484/ 000/ 00, 
West  Coast  Res.  Center,  Unisys  Corp.,  2400  Colorado  Avenue,  Santa  Monica, 
CA  90406,  October  1988. 

[60]  M.  Nelson,  “Caching  in  the  SPRITE  Network  File  System,”  PhD  Thesis,  Uni¬ 
versity  of  California  at  Berkeley,  March  1988. 

[61]  H.  Wedekind,  and  G.  Zoemtlein,  “Prefetching  in  Real  time  Database  Applica¬ 
tions,”  Proceedings  of  the  of  ACM  SIGMOD  International  Conference  on  Man¬ 
agement  of  Data,  1986. 

[62]  G.  Dill,  “Peripheral  Semiconductor  Storage  —  A  Feasible  Alternative  To  Disk 
and  Tape?,”  Hardcopy,  vol.  7,  no.  1,  January  1987. 

[63]  B.  Lee,  and  G.  Wiederhold,  “Outer  Joins  and  Filters  for  Instantiating  Objects 
from  Relational  Databases  through  Views,”  Technical  Report  no.  30,  Center  for 
Integrated  Facihty  Engineering  (CIFE),  Stanford  University,  May  1990. 

[64]  P.  Dwyer,  and  J.  Larson,  “Some  Experiences  with  a  Distributed  Database 
Testbed  System,”  IEEE  Proceedings,  vol.  75,  no.  5,  May  1987,  pp.  633-648. 


132 


BIBLIOGRAPHY 


Also  appear  in  Gupta,  A.  (ed.),  ‘Integration  of  Information  Systems:  Bridging 
Heterogeneous  Databases,’  IEEE  Press,  1989. 

[65]  A.  Tanenbaum,  “Computer  Networks,”  1st  edition,  Prentice-Hall,  Inc.,  Engle¬ 
wood  Cliffs,  New  Jersey  07632,  1981. 

[66]  A.  Tanenbaum,  “Computer  Networks,”  2nd  edition,  Prentice-Hall,  Inc.,  Engle¬ 
wood  Cliffs,  New  Jersey  07632,  1981. 

[67]  D.  Comer,  “Internetworking  with  TCP/IP:  Principles,  Protocols  and  Architec¬ 
ture,”  Prentice-Hall,  Inc.  1988. 

[68]  S.  Reddy,  NSFnet  Today:  A  New  Implementation  of  a  Vast  Research  Network” 
LAN  Magazine,  June  1989. 

[69]  D.  Cheriton  and  C.  Williamson,  “Network  Measurement  of  the  VMTP  Request- 
Response  Protocol  in  the  V  Distributed  System,”  Proceedings  of  the  ACM  SIG- 
METRICS  Conference  on  Measurement  and  Modeling  of  Computer  Systems, 
Banff,  Alberta,  Canada,’  May  1987,  pp.  216-225. 

[70]  P.  Fischer,  and  S.  Thomas,  “Operators  for  Non-First-Normal-Form  Relations,” 
Proceedings  of  the  IEEE  COMPSAC  International  Computer  Software  and  Ap¬ 
plication  Conference,  November  1983. 

[71]  M.  Roth,  H.  Korth,  and  A.  Silberschatz,  “Theory  of  Non-First-Normal-Form 
Relational  Databases,’  Technical  report  no.  TR-84-36,  Department  of  Computer 
Science,  University  of  Texas  at  Austin,  Austin,  Texas  78712,  December  1984.  < 
in  TODS  on  “Nested  relations”  > 

[72]  S.  Abiteboul,  and  N.  Bidoit,  “Non-First  Normal  Form  Relations  to  Represent 
Hierarchically  Organized  Data,”  Proceedings  of  the  ACM  SIGACT-SIGMOD 
Symposium  on  Principles  of  Database  Systems,  April  1984. 

[73]  D.  Bitton,  “The  Effect  of  Large  Main  Memory  on  Database  Systems,”  Panel 
report.  Proceedings  of  the  ACM  SIGMOD  International  Conference  on  Manage¬ 
ment  of  Data,  1986,  pp.  337-339. 


BIBLIOGRAPHY 


133 


[74]  A.  Ammann,  M.  Hanrahan,  and  R.  Krishnamurthy,  “Design  of  a  Memory  Resi¬ 
dent  DBMS,”  Proceedings  of  the  IEEE  COMPCON  Conference,  San  Francisco, 
February  1985 

[75]  K.  Whang,  et  al.,  “Office-By- Example,  An  Integrated  Office  System  and 
Database  Manager,”  ACM  Transactions  on  Office  Information  Systems,  vol.  5, 
no.  4,  October  1987,  pp.  393-427. 

[76]  T.  Lehman,  and  M.  Carey,  “A  Study  of  Index  Structures  for  Main  Memory 
Database  Management  Systems,”  Proceedings  of  the  of  the  12th  International 
Conference  on  Very  Large  Data  Bases,  Kyoto,  August,  1986,  pp.  294-303. 

[77]  T.  Lehman,  and  M.  Carey,  “Query  Processing  in  Main  Memory  Database  Man¬ 
agement  Systems,”  Proceedings  of  the  ACM  SIGMOD  International  Conference 
on  Management  of  Data,  1986,  pp.  239-250. 

[78]  H.  Garcia-Molina,  R.  Lipton,  and  J.  Valdes,  “A  Massive  Memory  Machine,” 
Proceedings  of  the  IEEE  COMPCON  Conference,  1984. 

[79]  D.  Bitton  and  C.  Turbyfill,  “Performance  Evaluation  of  Main  Memory  Database 
Systems,”  Technical  Report  86-731,  Department  of  Computer  Science,  Cornell 
University,  Ithaca,  New  York,  January  1986. 

[80]  D.  Bitton,  M.  Hanrahan,  and  C.  Turbyfill,  “Performance  of  Complex  Queries  in 
Main  Memory  Database  Systems,”  Proceedings  of  the  IEEE  3rd  International 
Conference  on  Data  Engineering,  1987,  pp.  72-81. 

[81]  M.  Eich,  “MARS:  The  Design  of  a  Main  Memory  Database  Machine,”  Pro¬ 
ceedings  of  the  5th  International  Workshop  on  Database  Machine,  Karuizawa, 
October  1987. 

[82]  A.  Swami,  “Optimization  of  Large  Join  Queries,”  Ph.D.  Thesis,  Report  no. 
STAN-CS-89-1262,  Department  of  Computer  Science,  Stanford  University,  1989. 

» 

[83]  E.  Horowitz,  and  S.  Sahni,  “Fundamentals  of  Data  Structures,”  Computer  Sci¬ 
ence  Press,  Inc.,  1976. 


134 


BIBLIOGRAPHY 


[84]  W.  Mauer,  and  T.  Lewis,  “Hash  Table  Methods,”  ACM  Computing  Surveys,  vol. 
7,  no.  1,  March  1975,  pp.  5-20. 

[85]  K.  Whang,  and  R.  Krishnamurthy,  “Query  Optimization  in  a  Memory-Resident 
Domain  Relational  Calculus  Database  System,”  ACM  Transactions  on  Database 
Systems,  vol.  15,  no.  1,  March  1990. 

[86]  D.  DeWitt,  R.  Katz,  F.  Olken,  L.  Shapiro,  M.  Stonebraker,  and  D.  Wood,  “Imple¬ 
mentation  Techniques  for  Main  Memory  Database  Systems,”  Proceedings  of  the 
ACM  SIGMOD  International  Conference  on  Management  of  Data,  June  1984, 
pp.  1-8. 

[87]  L.  Shapiro,  “Join  Processing  in  Database  Systems  with  Large  Main  Memories,” 
ACM  Transactions  on  Database  Systems,  vol.  11,  no.  3,  September  1986,  pp. 
239-264. 

[88]  D.  Tsichritzis,  and  F.  Lochovsky,  “Data  Models,”  Prentice-Hall,  Inc.  1982,  pp. 
210-225. 

[89]  W.  Kim,  “Architectural  Issues  in  Object-oriented  Databases,”  The  Journal  of 
Object-oriented  Programming,  March  1990. 

[90]  A.  Aho,  J.  Hopcroft  and  J.  UUman,  “The  Design  and  Analysis  of  Computer 
Algorithms,”  Addison- Wesley  Publishing  Company,  1974. 

[91]  D.  Comer,  “The  Ubiquitous  B-Tree,”  ACM  Computing  Surveys,  vol.  11,  no.  2, 
June  1979. 

[92]  R.  Fagin,  J.  Nievergelt,  N.  Pippenger,  and  H.  R.  Strong,  “Extendible  Hashing:  A 
fast  access  method  for  dynamic  files,”  ACM  Transactions  on  Database  Systems, 
vol.  4,  no.  3,  September  1989,  pp.  315-344. 

[93]  W.  Litwin,  “Linear  Hashing:  A  New  Toolf  for  File  and  Table  Addressing,”  Pro¬ 
ceedings  of  the  6th  International  Conference  on  Very  Large  Databases,  Montreal, 
Canada,  October  1980. 


BIBLIOGRAPHY 


135 


[94]  D.  Knuth,  “The  Art  of  Computer  Programming,  Vol  3:  Sorting  and  Searching,” 
Addison-Wesley,  1973. 

[95]  A.  Aho,  J.  Hopcroft,  and  J.  UUman,  “Data  Structures  and  Algorithms,”  Addison- 
Wesley  Pubhshing  Company,  1983. 

[96]  Personal  communications  with  Janet  Miller  et  al.,  the  Iris  project  group,  Hewlett- 
Packard  Company,  Cupertino,  CA,  July  1990. 


NTIS  does  not  permit  return  of  items  for  credit 
or  refund.  A  replacement  will  be  provided  if  an  error 
is  made  in  filling  your  order,  if  the  item  was  received 
in  damaged  condition,  or  if  the  item  is  defective. 


Reproduced  by  NTIS 

National  Technical  Information  Service 
Springfield,  VA  22161 


This  report  was  printed  specifically  for  your  order 
from  nearly  3  million  titles  available  in  our  collection. 


For  economy  and  efficiency,  NTIS  does  not  maintain  stock  of  its  vast 
collection  of  technical  reports.  Rather,  most  documents  are  printed  for 
each  order.  Documents  that  are  not  in  electronic  format  are  reproduced 
from  master  archival  copies  and  are  the  best  possible  reproductions 
available.  If  you  have  any  questions  concerning  this  document  or  any 
order  you  have  placed  with  NTIS,  please  call  our  Customer  Service 
Department  at  (703)  487-4660. 

About  NTIS 

NTIS  collects  scientific,  technical,  engineering,  and  business  related 
information  —  then  organizes,  maintains,  and  disseminates  that 
information  In  a  variety  of  formats  —  from  microfiche  to  online  services. 
The  NTIS  collection  of  nearly  3  million  titles  includes  reports  describing 
research  conducted  or  sponsored  by  federal  agencies  and  their 
contractors;  statistical  and  business  information;  U.S.  military 
publications;  audiovisual  products;  computer  software  and  electronic 
databases  developed  by  federal  agencies;  training  tools;  and  technical 
reports  prepared  by  research  organizations  worldwide.  Approximately 
100,000  new  titles  are  added  and  indexed  into  the  NTIS  collection 
annually. 


For  more  information  about  NTIS  products  and  services,  call  NTIS 
at  (703)  487-4650  and  request  the  free  NTIS  Catalog  of  Products 
and  Services,  PR-827LPG,  or  visit  the  NTIS  Web  site 
http:/Aivww.ntis.gov. 


NTIS 

Your  indispensable  resource  for  government-sponsored 
information — U.S.  and  worldwide 


